DIGITAL  IMAGE  ANALYSIS  OF  THE  LARYNX 


By 

YASER  S.  NATOUR 


A DISSERTATION  PRESENTED  TO  THE  GRADUATE  SCHOOL 
OF  THE  UNIVERSITY  OF  FLORIDA  IN  PARTIAL  FULFILLMENT 
OF  THE  REQUIREMENTS  FOR  THE  DEGREE  OF 
DOCTOR  OF  PHILOSOPHY 

UNIVERSITY  OF  FLORIDA 


2001 


Copyright  2001 


by 


Yaser  S.  Natour 


ACKNOWLEDGMENTS 


I give  special  acknowledgment  to  Dr.  Christine  Sapienza,  the  chair  of  my 
dissertation  committee.  She  devoted  time,  effort,  guidance  and  patience  throughout  the 
dissertation  process.  She  has  been  a model  of  scientific  integrity,  devotion,  and 
professionalism.  I also  sincerely  appreciate  the  efforts  of  my  committee  members.  Dr. 
William  S.  Brown,  J.r.,  Dr.  Howard  Rothman,  and  Dr.  Paul  Davenport.  I also  thank  Dr. 
Mark  Schmalz,  the  special  member  of  my  committee,  for  providing  expertise  in  digital 
image  analysis  and  image  algebra. 

I thank  my  wife  Basima,  who  has  been  patient  enough  to  support  me  in  times  of 
stress  and  need.  I also  thank  my  colleagues  Judith  Wingate,  Bari  Hoffman,  Ahmad 
Saleem,  Viviane  Marino,  Mark  Brunner,  Andy  Nott,  Noah  Satndrige,  Mohammed  Khairy 
and  Karen  Wheeler  whose  help  and  support  were  greatly  appreciated. 

Finally,  I extend  my  sincere  appreciation  to  my  mentors,  the  Fulbright  foundation 
and  the  University  of  Jordan  for  funding  my  work  and  providing  me  with  the  opportunity 
to  continue  my  education. 


11 


ACKNOWLEDGMENTS 


I give  special  acknowledgment  to  Dr.  Christine  Sapienza,  the  chair  of  my 
dissertation  committee.  She  devoted  time,  effort,  guidance  and  patience  throughout  the 
dissertation  process.  She  has  been  a model  of  scientific  integrity,  devotion,  and 
professionalism.  I also  sincerely  appreciate  the  efforts  of  my  committee  members.  Dr. 
William  S.  Brown,  J.r.,  Dr.  Howard  Rothman,  and  Dr.  Paul  Davenport.  I also  thank  Dr. 
Mark  Schmalz,  the  special  member  of  my  committee,  for  providing  expertise  in  digital 
image  analysis  and  image  algebra. 

I thank  my  wife  Basima,  who  has  been  patient  enough  to  support  me  in  times  of 
stress  and  need.  I also  thank  my  colleagues  Judith  Wingate,  Bari  Hoffman,  Ahmad 
Saleem,  Viviane  Marino,  Mark  Brunner,  Andy  Nott,  Noah  Satndrige,  Mohammed  Khairy 
and  Karen  Wheeler  whose  help  and  support  were  greatly  appreciated. 

Finally,  I extend  my  sincere  appreciation  to  my  mentors,  the  Fulbright  Foundation 
and  the  University  of  Jordan  for  funding  my  work  and  providing  me  with  the  opportunity 
to  continue  my  education. 


iii 


TABLE  OF  CONTENTS 

page 

TABLE  OF  CONTENTS iv 

ACKNOWLEDGMENTS iii 

ABSTRACT vi 

INTRODUCTION  AND  REVIEW  OF  THE  LITERATURE 1 

Introduction 1 

Digital  Image 2 

Digital  Image  Processing 2 

Digital  Image  Analysis 3 

Digitization 3 

Image  Histogram 5 

Fourier  Transformation 7 

Noise 8 

Image  Segmentation 9 

Pointwise  Operations 10 

Thyroplasty 13 

The  History  of  Laryngeal  Viusalization 15 

Laryngeal  Photography 16 

Endoscopy 18 

Videolaryngostroboscopy 19 

Quantification  of  Endoscopic  Images 22 

Statement  of  the  Problem 3 1 

Pilot  Study 32 

Purpose 39 

METHODOLOGY 41 

Software  packages 41 

Adobe  Photoshop 41 

Gimp 42 

Scion  Image 43 

Subjects 45 

Equipment  and  Procedures 47 

Adobe  Photoshop 48 

Gimp 50 


IV 


Scion  Image 51 

Statistical  Analysis 51 

RESULTS 54 

Descriptive  Statistics 54 

Inferential  Statistics 60 

Normalized  Glottal  Gap  Area  (NGGA) 60 

Between  subject  factors 60 

Within  subject  factors 60 

Total  Vocal  Fold  Length  (TVFL) 61 

Between  subject  factors 61 

Within  subject  factors 62 

Post  hoc  Tests 64 

Normalized  Glottal  Gap  Area  (NGGA) 66 

Total  Vocal  Fold  Length  (TVFL) 67 

Analyst  by  condition  interaction 67 

Program  by  condition  interaction 68 

Program  by  analyst  interaction 70 

DISCUSSION 72 

Normalized  Glottal  Gap  Area  (NGGA) 73 

Effect  of  Condition 73 

Effect  of  Program 74 

Effect  of  Analyst  by  Condition  Interaction 78 

Total  Vocal  Fold  Length  (TVFL) 81 

Effect  of  Condition 81 

Effect  of  Program  by  Analyst  Interaction 81 

Effect  of  Program  by  Condition  Interaction 83 

Effect  of  Analyst  by  Condition  Interaction 83 

General  Observations 85 

Clinical  Implications 87 

Comparison  with  Pilot  Study 89 

Generalizations  of  Findings 90 

Strengths  and  Limitations 91 

Strengths 91 

Limitations 92 

Suggestions  for  Further  Research 93 

Summary  and  Conclusion 94 

REFERENCES 96 

BIOGRAPHICAL  SKETCH 102 


V 


Abstract  of  Dissertation  Presented  to  the  Graduate  School 
of  the  University  of  Florida 
in  Partial  Fulfillment  of  the  Requirements  for  the 
Degree  of  Doctor  of  Philosophy 

DIGITAL  IMAGE  ANALYSIS  OF  THE  LARYNX 

By 

Yaser  S.  Natour 
May  2001 

Chairperson;  Christine  M.  Sapienza 

Major  Department:  Communication  Sciences  and  Disorders 

Objective  quantification  of  normal  and  pathological  laryngeal  dimensions  using 
videolaryngostroboscopic  images  has  been  an  important  clinical  issue  for  speech- 
language  pathologists.  It  is  necessary  to  have  a tool  designed  to  objectively  document 
changes  in  laryngeal  structure  with  treatment.  Documenting  changes  in  laryngeal 
structure  as  a function  of  a particular  surgical  procedure  or  medication  would  assist 
physicians  and  speech-language  pathologists  in  documenting  the  outcomes  of  their 
therapy  regimens.  The  current  and  most  widely  used  method  for  documenting  the  status 
of  the  laryngeal  structure  is  endoscopy,  with  video  recording  of  digital  imagery  for 
archival  purposes.  Unfortunately,  the  in  vivo  determination  of  laryngeal  anatomy  and 
dimensions  has  been  seriously  hampered  by  difficulties  with  image  distortion  from  the 
endoscopic  technique.  Currently,  researchers  and  clinicians  are  using  commercially 
available  products  to  assist  in  image  analysis  of  the  larynx.  Unfortunately,  the 
commercially  available  imaging  software  packages  (for  example  NIH  Image  and  Adobe 


VI 


Photoshop)  have  serious  inherent  artifacts  that  remain  unresolved,  such  as  the  lack  of 
automated  glottal  gap  tracing,  the  lack  of  ability  to  account  for  color  and  illumination 
distortion,  and  an  inability  to  account  for  the  problems  of  a multi-pixel  boundary 
gradient.  This  mixed  design  repeated-measures  study  examined  three  imaging  software 
packages  for  their  usefulness  in  laryngeal  image  analysis.  Specifically,  Adobe  Photoshop, 
Gimp,  and  Scion  Image  were  examined  in  terms  of  inter-program  and  inter-analyst 
consistency  in  the  measurement  of  glottal  gap  area  (GGA),  and  total  vocal  fold  length 
(TVFL).  Three  endoscopic  image  frames  of  a normal  subject's  larynx  and  three 
endoscopic  image  frames  of  a patient  with  the  appropriate  diagnosis  of  adductor  vocal 
fold  paralysis  during  voicing  were  used  as  the  samples  for  analysis.  One  condition  was 
analyzed:  maximum  opening  during  voicing.  Each  image  frame  of  the  subjects'  vocal 
folds  was  measured  ten  times  by  three  analysts,  two  experts  and  one  naive. 

The  findings  of  this  study  for  GGA  area  measurement  suggest  that  the  program 
used  and  the  condition  have  an  effect  on  the  consistency  of  the  results.  Also,  the 
combination  of  analyst  and  condition  has  an  effect  on  such  measurement.  Measurement 
results  for  TVFL  suggest  that  there  were  combined  effects  of  analyst  and  condition, 
program  and  condition,  and  program  and  analyst  on  the  consistency  of  the  resulting 
measurements. 

This  study  recommends  investigating  Scion  Image  thoroughly  as  a potential 
platform  for  the  development  of  the  new  image  analysis  tool.  The  design  of  the  new  tool 
should  analyze  the  effects  of  image  noise  as  well  as  computational  and  systematic  error 
on  the  accuracy  of  measured  results. 


vii 


CHAPTER  1 

INTRODUCTION  AND  REVIEW  OF  THE  LITERATURE 

Introduction 

Quantifying  the  dimensions  of  the  laryngeal  structure  and  the  movement  of  the 
vocal  folds  has  been  an  important  issue  for  both  clinicians  and  voice  scientists.  Imaging 
is  one  means  for  documenting  the  physical  status  of  the  vocal  folds,  and  along  with 
adjunct  procedures,  can  be  used  to  define  vocal  fold  movement  (Saadah  et  al.  1997). 
Laryngeal  images  are  commonly  obtained  through  routine  endoscopic  examination  of  the 
larynx.  Endoscopy,  as  a general  term,  is  the  visualization  of  the  internal  parts  of  the  body 
by  use  of  an  endoscope.  Laryngeal  endoscopy  is  the  direct  visualization  of  the  laryngeal 
structures  via  the  introduction  of  an  endoscope  into  the  mouth  or  through  the  nose 
(Zemlin  1998).  Endoscopy,  with  video  recording  of  digital  imagery  for  archival  purposes, 
provides  a record  of  the  physical  structures  of  the  larynx.  Videostroboseopy,  a record  of 
the  vocal  folds  in  motion,  is  the  current  and  most  widely  used  adjunct  method  for 
documenting  vocal  fold  function.  Currently,  clinical  interpretation  of  videostroboscopic 
images  follows  a well-defined  protocol  that  depends  on  visualization  and  listening,  i.e. 
visualizing  the  laryngeal  structures,  noting  their  function  and  listening  to  the  emitted 
sound  that  results  from  the  vibration  of  the  vocal  folds  (Bless  1991,  Bless  et  al.  1987, 
Kitzing  1985,  Stemple  2000). 

Quantification  of  laryngeal  function  and  structure  through  objective  measurement 
of  laryngeal  dimensions  using  endoscopic  images  has  been  proposed  in  the  open 


1 


2 


literature.  In  the  area  of  speech  language  pathology,  inexpensive  quantification  of 
endoscopic  images  has  been  promoted  as  sufficient  for  quantifying  laryngeal  dimensions 
(Gon9alves  & Leonard  1998).  The  methods  call  for  inexpensive,  commercial  off-the- 
shelf  (COTS)  digital  imaging  processing  hardware,  as  well  as  software. 

The  area  of  digital  image  processing  and  analysis  seems  promising  as  it  carries 
the  potential  for  objectifying  endoscopic  findings.  Hence,  a review  of  the  important 
concepts  related  to  the  field  of  digital  imaging  is  essential  in  order  to  familiarize  the 
reader  with  the  concepts  of  this  process. 

Digital  Image 

A digital  image  is  typically  a two-dimensional  projection  of  a three-dimensional 
scene  into  an  m x n element  focal  plane  (i.e.,  a matrix  where  m represents  the  number  of 
rows  and  n represents  the  number  of  columns).  A digital  image,  in  other  words,  is  a two- 
dimensional  representation  of  an  object  sampled  in  an  equally  spaced  rectangular  grid 
pattern,  and  quantized  in  equal  intervals  of  intensity,  also  called  gray  level  (Russ  1999). 
Digital  images  are  suitable  for  manipulation  by  computers,  using  algorithms  that  extract 
application-specific  parameters  (Ulrich  1993). 

Digital  Image  Processing 

Digital  image  processing  subjects  numerical  representations  of  images  to  a series 
of  operations  in  order  to  obtain  a desired  result.  In  laryngoscopy,  the  purpose  of  digital 
image  processing  is  mainly  to  improve  the  appearance  of  an  image  to  the  human  observer 
and  to  prepare  images  for  the  measurement  of  structures  of  interest  (Russ  1999,  Ulrich 


1993). 


3 


Digital  Image  Analysis 

Digital  image  analysis  is  a process  that  transforms  a digital  image  into  a set  of 
measurement  data  or  a decision;  for  example,  extracting  a part  of  the  image  for 
measurement  (Castleman  1979,  Russ  1999). 

Digitization 

Digitization  is  the  process  of  converting  a continuous  input  into  a discrete  output. 
Images  are  made  up  of  picture  elements  or  pixels.  Each  pixel  has  a location  at  which  the 
image  brightness  is  sampled  and  quantified.  Digitization  generates  an  integer  at  each 
pixel  representing  the  brightness  or  darkness  of  the  image  at  that  location,  to  form  a 
rectangular  array  of  integers.  Each  pixel  has  a location  (a  line  or  row  number  and  a 
sample  or  column  number)  and  an  integer  value  called  the  gray  level.  The  digitization 
produces  an  array  of  digital  data  that  is  ready  to  be  computer  processed.  This  array  is  then 
transferred  to  the  computer  memory  to  be  stored  until  the  user  instructs  the  computer  to 
call  up  and  execute  an  image  processing  program.  During  execution,  the  image  is  read 
into  the  computer  line  by  line.  Operating  upon  one  image,  the  computer  generates  the 
output  image  pixel  by  pixel.  After  processing,  the  final  product  is  displayed.  The  gray 
level  of  each  pixel  is  used  to  determine  the  brightness  or  darkness  of  the  corresponding 
point  on  a display  screen. 

The  digitization  process  consists  of  the  following  steps:  transduction, 
quantization,  scanning,  and  sampling.  Transduction  is  a process  that  converts  one 
physical  quantity  to  another,  such  as  converting  light  energy  to  electrical  energy. 
Quantization  is  the  representation  of  a measured  value  by  an  integer.  Scanning  is  the 
selective  addressing  of  specific  locations  within  the  domain  of  an  image.  Each  of  the 


4 


small  subregions  addressed  in  scanning  is  a picture  element  (pixel).  Sampling  is 
measuring  the  gray  level  of  an  image  at  a pixel  location.  To  display  the  image,  the  same 
steps  are  used,  but  in  reverse  order.  With  image  display,  there  is  a one-to-one 
correspondence  between  each  input  and  each  output  pixel  (picture  element).  The  resulting 
gray-level  value  of  the  output  image  is  stored  in  the  corresponding  output  point.  These 
values  are  stored  in  a spaced  rectangular  grid  pattern  that  is  divided  into  horizontal  and 
vertical  directions,  each  of  which  contains  a finite  number  of  square  pixels.  Figure  1-1 
shows  an  example  of  a grid  pattern  superimposed  on  a laryngoscopic  image.  The  number 
of  pixels  contained  can  be  large  (e.g.,  65536  for  a 256  x 256  image).  This  number  affects 
image  resolution.  That  is,  the  larger  the  number  of  pixels,  the  better  the  image  resolution 
(Castleman  1979).  The  pixel  values  are  sampled  through  analog-to-digital  conversion 
(A/D),  where  the  total  number  of  pixels  sampled  by  the  A/D  conversion  across  the 
horizontal  line  determines  the  horizontal  dimensions  of  the  digital  matrix  of  the  image. 
The  vertical  dimension  equals  the  number  of  vertical  lines  digitized.  The  range  of 
magnitudes  of  the  numbers  generated  by  the  A/D  conversion  determines  the  contrast  or 
gray  scale  resolution  of  the  digital  images  (Price  1997). 

The  purpose  of  A/D  conversion  is  to  simplify  numerous  measurements  and 
processing  operations.  For  example,  using  a finite  pixel  area,  region  boundaries  can  be 
defined  using  the  intermediate  brightness  value  resulting  from  averaging  image 
brightness  values  within  a given  region.  If  the  gray  level  values  along  both  sides  are 
similar,  the  boundary  can  be  assumed  to  have  some  geometric  shape,  i.e.,  a line  (Russ 
1999).  To  perform  image  acquisition  operations,  three  elements  should  be  available:  a 
computer,  an  image  digitizer,  and  an  image  display  device  (Castleman  1979).  Digital 


5 

image  processing  performs  a number  of  operations  on  the  numerical  representations  of 
objects  to  obtain  certain  planned  results  (e.g.,  image  enhancement,  image  segmentation, 
etc.). 


Figure  1-1.  A grid  pattern  superimposed  on  a laryngeal  image. 

Image  Histogram 

An  image  histogram  is  a plot  of  the  number  of  pixels  at  a particular  gray  level 
versus  the  gray-level  value.  The  abscissa  is  the  gray  level  and  the  ordinate  is  the 
frequency  of  occurrence  (number  of  pixels).  The  shape  and  the  plotted  values  of  an 
image  histogram  provide  significant  information  about  the  image.  For  example,  when 
most  of  the  pixel  values  are  contained  within  the  lower  half  of  the  histogram  display 
levels,  the  image  has  low  contrast  (it  appears  dim).  If  the  pixel  values  are  concentrated  in 
the  upper  half  of  the  histogram  display  levels,  the  image  has  high  contrast  (it  appears 
bright).  Also,  brightness  differences  among  the  various  pixels  correspond  to  contrast  of 


6 

the  image.  Generally,  a narrow  grayscale  distribution  indicates  a low  contrast  image  and 
a wide  grayscale  distribution  indicates  a high  contrast  image. 

Histograms  also  can  also  be  transformed  (Price  1997).  Figure  l-2a  shows  a 
laryngoscopic  image  where  the  image  quality  is  considered  bright.  Figure  l-2b  shows  a 
corresponding  histogram  of  the  image  in  Figure  l-2a  with  its  pixels  scattered  toward  the 
upper  50  percentile  of  the  grayscale.  In  contrast,  Figure  l-3a  shows  the  same 
laryngoscopic  image  but  the  image  appears  dim.  Figure  l-3b  shows  the  corresponding 
histogram  of  the  image  in  Figure  l-3a  with  its  pixels  concentrated  in  the  lower  50 
percentile  of  the  grayscale.  Although  a histogram  specifies  the  number  of  pixels,  it  does 
not  specify  the  location  of  those  pixels  within  the  image.  Thus  a histogram  is  unique  to  an 
image,  but  the  converse  is  not  true.  Vastly  different  images  could  have  identical 
histograms  (Castleman  1979). 

Image  histograms  can  be  useful  in  different  ways.  Through  plotting  an  image 
histogram,  the  analyst  can  detect  scaling  errors  of  the  image  within  the  available  range  of 
gray  levels.  If  the  image  were  digitized  at  low  contrast,  the  range  of  the  gray-level 
histogram  would  appear  somewhat  limited  (occupying  an  area  less  than  the  whole  range 
of  256  gray  levels).  Although  some  image  details  could  be  lost  because  of  low  image 
contrast,  detection  of  this  effect  through  examining  the  histogram  would  prompt  the 
analyst  to  perform  certain  operations  that  distribute  the  available  pixels  over  the  whole 
range  of  gray  levels  (e.g.,  histogram  equalization).  Given  the  histogram,  an  optimal 
threshold  gray  level  for  the  object  could  be  determined  and  its  area  computed.  Having  a 
histogram  with  two  peaks  (e.g.,  a histogram  of  a dark  image  on  a light  background) 
would  guide  the  analyst  in  determining  the  threshold  of  that  image.  A threshold  gray 


7 

in  the  area  between  the  two  peaks  eould  represent  a reasonable  object  boundary,  helping 
the  analyst  in  image  segmentation. 

Fourier  Transformation 

Fourier  transformation  is  an  operation  that  transforms  a complex  function  of  a 
finite  number  n of  real  variables  into  another  function  of  n complex  variables.  Fourier 


Figure  1-2.  Histogram  manipulation  (a)  a laryngoscopic  image  with  a bright  appearance, 
and  (b)  the  corresponding  image  histogram  with  pixels  scattered  toward  the 
upper  50  percentile  of  the  grayscale  (b). 


transformation  can  be  used  to  express  preservation  of  detail  in  an  image  during  analysis. 
In  linear  system  analysis,  the  Fourier  Transformation  allows  an  analyst  to  quantify  the 
effects  of  digitizing  systems,  sampling  region  configuration,  electronic  amplifiers,  filters, 
noise,  and  display  region  configuration.  The  Fourier  Transformation  is  used  to 
decompose  a function  f (x)  or  f (y,  x)  into  an  accumulated  series  of  sine  functions  of  form 
Sin  (cox  + (j) ) where  co  denotes  frequency,  and  (j)  denotes  phase).  The  accumulated  sine 
waves,  in  turn  can  be  manipulated  to  produce  a shift  in  frequency  and/or  a shift  in  phase, 
which  is  useful  in  image  signal  compression  or  stretching. 


8 


(a)  (b) 

Figure  1-3.  Histogram  manipulation  (a)  a laryngoscopic  image  appearing  dim,  and  (b) 
the  corresponding  image  histogram  with  pixels  concentrated  in  the  lower  50 
percentile  of  the  grayscale. 


Noise 

Noise  is  an  unknown  perturbing  signal.  Random  noise  in  an  image  is  often 
presented  as  low-intensity  yet  potentially  rapid  variation  in  gray  scale  (i.e.,  high 
frequency)  (Castleman  1979).  Noise  can  be  introduced  into  an  image  in  a number  of 
ways,  such  as  variation  in  signal-to-noise-ratio  (SNR)  of  the  sensor  used  in  acquiring  the 
image.  When  a signal  is  recorded,  noise  is  superimposed  on  the  recorded  signal. 

Although  the  noise  source  could  be  determined,  the  main  difficulty  remains  in  expressing 
the  noise  variable  mathematically.  Some  aspects  of  noise  can  be  recognized;  however, 
noise  cannot  be  determined  completely  and  in  detail.  That  leaves  the  analyst  with  only 
one  choice,  namely,  to  treat  noise  as  a pseudo-random  variable.  Determining  noise 
among  other  functions  is  nearly  impossible.  The  analyst  can  only  make  generalizations  as 
to  the  group  of  functions  as  a whole.  A random  variable  can  be  approached  by  computing 
its  average.  This  can  be  done  in  two  ways:  either  by  time  averaging  or  by  averaging  the 
values  of  all  member  functions  evaluated  at  some  particular  point  in  time.  The  latter 
technique  produces  a group  average  (Castleman  1979).  Figure  l-4a  shows  a 


9 


laryngoscopic  image  of  a normal  adult  male's  vocal  folds  during  breathing.  Figure  l-4b 
shows  the  same  image  with  uniform  noise  added  synthetically  (i.e.,  using  an  image 
processing  operation)  by  the  value  of  64  (distribution  of  noise  using  random  numbers 
between  zero  and  ± 64).  Noisy  images  can  result  from  a number  of  factors  including  the 
above  discussed  image  sensor  nonlinearity  (i.e.,  when  the  sensor  is  not  responding 
linearly  to  different  light  intensities  of  the  image).  Geometric  distribution  of  the  lenses  is 
another  noise.  Lens  distribution  may  cause  the  comers  and  edges  of  the  image  to  be  dim 
(Russ  1999). 


Figure  1-4.  Image  noise  (a)  a laryngoscopic  image  of  a normal  adult  male's  vocal  folds 
during  breathing,  and  (b)  the  same  image  with  uniform  noise  synthetically 
added. 


Image  Segmentation 

Image  segmentation  involves  differentiating  regions  in  a digital  image  by  their 
mathematical  properties  (e.g.,  an  object-background  discontinuation).  Segmentation 
algorithms  are  designed  to  recognize  an  object's  features  that  are  not  in  common  with  its 
background  (e.g.,  geometric,  textural  and  intensity  differences).  One  common  method  is 


10 


segmentation  by  thresholding  (i.e.,  representing  structures  on  volume  bases)  (Castleman 
1979).  For  example,  laryngeal  cartilages  can  be  represented  as  all  pixels  with  a gray  level 
value  above  +150  out  of  a possible  255  pixels,  and  vocal  folds  between  -150  and  + 150 
out  of  a possible  255  pixels.  This  method  defines  a range  of  brightness  values,  and  then 
selects  the  pixel  gray  level  values  within  that  range  as  background.  Figure  1-5  shows  an 
example  of  thresholding  a laryngoscopic  image.  The  benefit  of  thresholding  this  image  is 
to  distinguish  the  vocal  folds  from  the  vocal  gap  by  brightness  level,  which  makes 
measurement  of  dimension,  such  as  total  vocal  fold  length,  relatively  easy. 


Figure  1-5.  Thresholding  a laryngoscopic  image. 

Pointwise  Operations 

Pointwise  operations,  contrast  enhancement,  contrast  stretching,  and  gray-level 
manipulations  are  pixel-level  operations  that  can  be  used  to  improve  subjective  image 
appearance.  Some  pixel-level  operations  are  pointwise  operations.  For  example,  a point 
operation  can  be  represented  mathematically  as: 


Glottal  Gap 
(background) 


Vocal  folds 


B (X,  y)=  / [ A (x,  y)  ] 


11 


Where  A (x,  y)  and  B (x,  y)  are  input  image  values  at  coordinates  (x,  y),  and  / is  the 
function  that  specifies  the  mapping  of  A (x,  y)  to  B (x,  y).  Pointwise  operations  are  used 
to  overcome  digitizer  limitations  before  the  actual  processing  begins.  For  example, 
pointwise  operations  can  be  used  to  remove  the  effects  of  image  sensor  nonlinearity 
(which  creates  noise).  If  an  image  has  been  digitized  by  an  instrument  that  has  nonlinear 
response  to  light  intensity,  a pointwise  operation  can  transform  the  gray  scale  of  the 
image  to  an  output  gray  level  with  equal  increments  in  light  intensity.  Another  example  is 
using  a pointwise  operation  to  transform  the  units  of  the  gray  scale  to  an  output  image  in 
which  gray  levels  represent  equal  steps  in  optical  density.  Pointwise  operations  can  also 
be  used  before  image  display.  Many  display  devices  do  not  maintain  a linear  relationship 
between  the  gray  level  of  a pixel  in  the  digital  image  and  the  brightness  of  the 
corresponding  point  on  the  display  screen.  Taken  together,  pointwise  operation  and  the 
display  nonlinearities  combine  to  cancel  each  other  out,  preserving  linearity  in  the 
displayed  image. 

Display  devices  usually  have  a range  of  gray  levels  over  which  the  image  features 
are  visually  discemable.  When  an  analyst  is  trying  to  differentiate  darker  and  lighter 
features  on  an  image  display,  a pointwise  operation  can  be  used  to  ensure  that  the  features 
of  interest  fall  into  the  preferred  range  of  gray-level  display.  When  features  of  an  image 
occupy  a relatively  narrow  range  of  grayscale,  a pointwise  operation  could  be  used  to 
expand  the  contrast  of  the  features  of  interest.  The  result  would  be  features  that  occupy 
the  entire  displayed  range  of  gray  level.  A pointwise  operation  could  also  divide  an  image 
into  differentiated  regions  on  the  basis  of  gray  levels  (thresholding).  This  process  is 
useful  for  defining  boundaries  in  noisy  images  (Castelman  1979). 


12 

Many  of  the  operations  discussed  above  are  used  by  a number  of  software 
imaging  packages  such  as  Adobe  Photoshop  5.5  (Adobe  Systems  Inc.,  CA),  Gimp  version 
0.99.X/1.0  (Image  Manipulation  Program,  Experimental  Computing  Facility,  University 
of  California  at  Berkley)  and  NIH  Image  (The  Macintosh  version  of  Scion  Image, 

National  Institute  of  Health,  Bethesda,  MD).  However,  these  software  packages  have 
possible  limitations  that  are  discussed  further  in  this  study. 

Using  NIH  Image  to  quantify  laryngeal  dimensions  has  become  commonplace 
among  speech-language  pathologists,  otolaryngologists,  and  voice  scientists.  Examples  of 
studies  based  on  NIH  Image  include  Gonfalves  and  Leonard  (1998),  Omori  et  al.  (1996, 
1998)  and  Inagi  et  al.  (1997).  In  these  studies,  endoscopic  images  were  used  in  order  to 
achieve  acceptable  quantification  of  laryngeal  structures.  Endoscopic  images  of  the 
larynx  were  used  because  speech  language  pathologists  and  voice  scientists  rely  on  such 
imagery  for  studying  laryngeal  dimensions,  especially  before  and  after  therapy  and/or 
surgery. 

This  study  uses  laryngeal  images  of  a normal  subject  and  laryngeal  images  of  a 
patient  with  unilateral  adductor  vocal  fold  paralysis.  This  voice  disorder  was  chosen 
because  its  surgical  and/or  behavioral  treatment  results  in  a change  in  glottal  gap  size. 

The  treatment  of  choice  for  adductor  vocal  fold  paralysis  is  Thyroplasty,  a surgical 
procedure  that  serves  to  lengthen  or  tense  the  vocal  folds  (Thyroplasty  Type  IV),  shorten 
them  (Thyroplasty  Type  III),  expand  them  laterally  (Thyroplasty  Type  II),  or  medialize 
them  (Thyroplasty  Type  I).  Thyroplasty  Type  I,  in  particular,  aims  at  shifting  the 
paralyzed  vocal  fold  medially  to  reduce  the  incomplete  closure  caused  by  the  paretic 
vocal  fold's  inability  to  adduct  the  glottis. 


13 


Thyroplastv 

Thyroplasty  is  defined  as  performing  surgery  on  the  laryngeal  structure  in  order  to 
alter  the  laryngeal  skeleton  and  thus  the  voice.  Isshiki  et  al.  (1974)  described  four  basic 
procedures  of  thyroplasty  that  aimed  at  lengthening  or  shortening  the  vocal  folds  and 
compressing  or  expanding  the  glottal  gap.  Thyroplasty  Type  IV  lengthens  and  tenses  the 
vocal  folds  through  the  anterior  approximation  of  the  cricoid  and  thyroid  cartilages. 
Originally,  nylon  sutures  were  used  to  stretch  and  tighten  vocal  folds,  thereby  raising 
vocal  pitch.  In  recent  years,  laser  scarring  has  been  suggested  as  a possible  alternative  of 
this  procedure  (von  Leden  1991).  Thyroplasty  Type  III  is  aimed  at  shortening  the  vocal 
folds.  In  this  procedure,  the  anterior  segment  of  the  thyroid  cartilage  was  incised  and 
depressed.  If  lowering  the  vocal  pitch  was  the  target,  the  procedure  was  combined  by 
increasing  the  mass  of  the  vocal  folds  through  injection  of  material  in  the  vocal  folds. 
Thyroplasty  Type  II  expands  the  vocal  folds  laterally  in  patients  with  abnormal  glottal 
gaps  after  injuries.  This  was  achieved  by  separating  the  thyroid  cartilage  anteriorly  and 
holding  the  two  parts  apart  with  a cartilage  splint.  Thyroplasty  Type  I was  designed  to 
shift  one  or  both  of  the  vocal  folds  medially.  Thyroplasty  Type  I procedure  is  the  focus 
of  this  study.  Potential  subjects  for  this  procedure  are  patients  with  dysphonia,  vocal  fold 
paralysis,  and  vocal  fold  atrophy.  In  vocal  fold  paralysis,  for  example,  the  paralyzed 
vocal  fold  is  pushed  medially,  thus  reducing  the  size  of  the  vocal  gap,  achieving  the  best 
glottic  sufficiency  (Bless  1991,  Hiroto  1976).  The  surgery  is  conducted  on  an  inpatient  or 
outpatient  basis  and  local  anesthesia  is  used.  A rectangular  incision  is  made  on  the 
thyroid  cartilage  at  the  level  of  the  vocal  fold.  Initially,  the  procedure  used  the  resulting 
rectangular  cartilaginous  fragment,  depressing  it  inward  thus  compressing  the  paralyzed 


14 


vocal  fold  medially.  A piece  of  the  thyroid  cartilage  can  be  taken  from  the  opposite  side 
and  used  as  a wedge,  if  necessary,  to  enhance  the  effect  of  lateral  compression  of  the 
vocal  fold.  After  surgery,  the  voice  is  generally  satisfactory.  Complications  such  as 
stridor  or  dyspnea  may  result  but  are  not  usually  reported.  As  surgical  intervention  inside 
the  thyroid  cartilage  is  minimal,  fine  adjustment  of  depression  is  possible  during  the 
surgery  (Isshiki  et  al.  1975). 

Materials  used  as  alternatives  for  the  autograft  cartilage  from  the  thyroid  included 
silicone,  nasal  septum,  or  ribs  (Blaugrund  1991,  Colton  & Casper  1996).  Silicone  is 
commonly  used  by  present  day  techniques,  either  hand-carved  by  the  surgeon  to  produce 
the  best  voicing  or  ready  made  implant  with  different  available  sizes  (Montgomery  & 
Montgomery  1997).  Regardless  of  the  etiology  of  unilateral  adductor  vocal  fold  paralysis 
and  the  method  used,  the  success  of  thyroplasty  is  determined  by  measuring  the  changes 
of  laryngeal  structures  and  the  functionality  of  the  patient's  voice  post  thyroplasty. 
Likewise,  acoustic  and  aerodynamic  measures  can  be  used  as  adjuncts  to  evaluate  the 
extent  of  glottal  gap  pre-thyroplasty  and  post-thyroplasty  (Blaugrund  1991). 

Regarding  analysis  of  endoscopic  images  from  patients  with  vocal  fold  paralysis, 
Inagi  et  al.  (1997)  collected  samples  during  modal  phonation  and  during  respiratory 
maneuvers  such  as  maximum  voluntary  inspiration  and  rest  breathing  in  subjects  with 
vocal  fold  paralysis.  Three  stroboscopic  frames  were  used:  full  adduction  during  modal 
phonation,  full  abduction  during  maximum  voluntary  inspiration  and  at  a resting  position. 
Reference  points  (anterior  commissure,  paralyzed  vocal  fold  process,  normal  side  of  the 
vocal  process,  and  the  midarytenoid  point)  were  determined  for  each  glottal  frame  and 
then  associated  with  a stored  digitized  image  to  facilitate  standardization  of 


15 


measurements  of  the  glottal  area,  length  and  angle.  Although  Inagi  et  al.  (1997), 
discussed  in  detail  below,  provided  valuable  input,  it  had  inherent  methodological 
problems  (e.g.,  manual  tracing  of  laryngeal  structure,  lack  of  consideration  to  the  gradient 
nature  of  boundaries,  etc.) 

While  endoscopy  and  videostroboscopy  have  been  subjected  to  an  evolutionary 
process  of  technological  advancements,  it  is  clear  that  methodological  problems  still 
exist.  Before  addressing  what  steps  need  to  be  taken  to  resolve  some  of  the  existing 
methodological  problems,  this  study  provides  a review  of  the  history  of  laryngeal 
visualization. 


The  History  of  Laryngeal  Visualization 
Since  the  eighteenth  century,  voice  specialists  have  been  trying  to  find  better 
methods  of  visualizing  laryngeal  structures  and  studying  laryngeal  physiology.  The  first 
reported  attempt  to  visualize  the  larynx  took  place  in  1807.  The  device  used  consisted  of 
a round  metal  tube,  enlarged  at  the  observer's  end  and  closed  at  the  other  end.  A metal 
partition  was  inserted  through  the  tube  to  facilitate  illumination  through  one  side  and 
visualization  through  the  other.  The  inner  sides  of  the  tube  contained  a mirror  that 
allowed  the  viewer  to  see  the  reflection  of  the  laryngeal  structure  through  a small  side 
opening.  A lantern  containing  candle  wax  was  initially  used  for  illumination.  Light  was 
collected  from  the  candles  through  a curved  reflector  mirror  that  directed  the  light 
through  one  of  the  partitions  (Mackenzie  1865,  Moore  1937). 

In  1 827,  a small  mirror  was  put  into  the  pharyngeal  area  in  order  to  visualize  the 
laryngeal  structure.  However,  a lack  of  illumination  caused  that  experiment  to  fail.  Other 
methods  mainly  used  mirrors  and  tried  to  improve  illumination  techniques.  The  major 


16 


initial  illumination  source  that  scientists  used  was  sunlight.  To  provide  sufficient 
illumination,  scientists  introduced  a headband  with  a semi-spherical  reflector  that  served 
to  increase  sunlight  concentration  to  one  point.  Others  used  candlelight  as  a light  source 
(Moore  1937). 

The  first  successful  visualization  of  the  larynx  was  conducted  by  Manuel  Garcia 
in  1854.  Garcia  was  a French  singing  teacher  who  attempted  to  visualize  the  larynx  using 
two  dental  mirrors.  He  inserted  one  mirror  into  the  fauces  and  used  another  to  direct  the 
sunlight  toward  the  same  area.  Improvement  on  Garcia's  work  started  by  the  introduction 
of  concave  mirrors  and  artificial  light  sources  that  achieved  better  illumination  and  more 
powerful  light  concentrations  (Hahn  & Kitzing  1978). 

Later  improvements  on  the  magnification  of  the  laryngeal  view  were  reported  in 
the  literature.  According  to  Moore  (1937),  Oretel  introduced  a mounted  telescope  with  a 
magnification  power  of  eight  times  in  1895.  Other  ideas  to  improve  magnification 
included  viewing  the  larynx  using  two  angles  (binocular  view).  In  1861,  scientists  used 
two  prisms  placed  behind  the  aperture  within  a head-mounted  concave  mirror.  The  image 
was  received  by  the  prisms,  and  reflected  by  another  mirror  to  the  examiner's  eyes. 
However,  this  trial  was  not  satisfactory  enough  because  the  examiner  could  only  see  the 
image  from  one  angle.  The  early  1960s  brought  further  improvements  such  as  attaching  a 
small  telescope  to  an  arm  of  an  illuminating  device  (Moore  1937). 

Laryngeal  Photography 

Laryngeal  photography  used  multiple  lens  systems.  These  systems  allowed 
multiple  pictures  of  the  laryngeal  structures  to  be  taken  simultaneously.  The  first  cameras 
used  a telescopic  lens  and  a convex  mirror  to  direct  the  laryngeal  image  towards  the  film. 


17 


The  first  motion  pictures  of  the  larynx  were  taken  in  1913.  Using  these  motion  pictures, 
two  French  scientists,  Chevroton  and  Vies,  were  able  to  study  gross  movements  of  the 
vocal  folds  and  to  examine  their  width,  length  and  movement  during  breathing.  The 
development  of  the  first  color  pictures  of  the  larynx  soon  followed  making  improved 
color  pictures  the  main  concern  at  the  time  (Moore  1937). 

The  first  high-speed  motion  picture  of  the  larynx  was  also  made  possible  in  1937 
by  Bell  Telephone  Laboratories  (Alberti  1978,  Timcke  et  al.  1958).  Timcke  et  al.  (1958) 
conducted  one  of  the  first  attempts  to  use  high-speed  photography,  studying  physiological 
activities  of  the  vocal  folds  such  as  open  phase  and  closed  phase  at  different  pitch  levels. 
Their  method  involved  seating  the  subject  on  an  adjustable  stool  before  a fixed  laryngeal 
mirror.  The  laryngeal  mirror  was  then  introduced  into  the  subject's  pharynx.  An  auxiliary 
mirror  provided  feedback  as  to  the  optimal  view  of  the  vocal  folds.  Other  feedback  was 
provided  to  the  experimenter  through  the  camera  viewing  system,  which  enabled  him/her 
to  note  the  vocal  pitch  that  allowed  best  exposure  of  the  entire  anteroposterior  length  of 
the  vocal  folds.  This  method,  they  stated,  offered  the  advantage  of  studying  each 
vibratory  cycle  separately,  frame  by  frame  without  possible  distortion  of  superimposing 
images  from  successive  samples.  The  sampling  rate  was  3000-5000  frames  per  second. 
Vocal  fold  pictures  were  magnified  15  times  by  a projection  system.  Measurements  from 
successive  photos  were  then  plotted  to  form  the  curve  of  the  glottal  wave.  A light  topical 
anesthetic  was  commonly  applied  to  the  soft  palate  to  reduce  gagging  resulting  from  the 
laryngeal  mirror.  When  the  proper  image  was  visualized,  the  camera  was  triggered 
(Hollien  & Moore  1960).  Although  the  use  of  high-speed  films  to  monitor  details  of  the 
glottal  cycle  was  common,  it  was  not  free  of  disadvantages.  For  example,  high-speed 


18 


photography  was  difficult  to  conduct  in  many  laboratories  because  of  the  large  size  of  the 
equipment  and  the  high  level  of  examiner  training  required.  It  was  also  expensive,  and 
could  not  be  performed  during  natural  conditions  because  of  the  need  to  use  a laryngeal 
mirror,  whieh  interferes  with  normal  speeeh  produetion  (Baer  et  al.  1983).  Likewise,  a 
high-speed  photography  film  10  seeonds  in  length  required  days  of  analysis  (Alberti 
1978). 

Endoseopv 

Direct  laryngeal  endoseopy  (or  direct  visualization  of  the  larynx)  was  developed 
in  response  to  the  shorteomings  of  Garcia's  indirect  laryngoscopy.  Using  mirrors  to 
visualize  the  larynx  had  little  sueeess  whenever  the  anterior  of  the  larynx  required 
visualization,  either  because  the  gag  reflex  was  triggered  or  because  the  positioning 
obstructed  the  laryngeal  anatomy  (Hahn  & Kitzing  1978).  With  direet  laryngeal 
endoscopy,  a general  anesthesia  was  used  because  the  procedure  required  inserting  a 
metal  tube  through  the  patient's  mouth  down  to  the  larynx.  Here  again  the  laryngeal 
funetion  eould  not  be  assessed. 

Aeeording  to  Padovan  et  al.  (1973)  another  laryngeal  endoscopy  method  was 
introdueed  toward  the  end  of  the  19*^  century.  Contrary  to  the  previous  method,  this 
method  did  not  require  general  anesthesia.  It  used  an  endoseope  coupled  with  a rigid 
telescope  that  had  a viewing  angle  of  90  degrees.  The  endoscopic  telescope  was  inserted 
toward  the  baek  of  the  pharynx  through  the  oral  eavity.  The  endoscopic  light  source  was 
a small  ineandeseent  light  bulb.  The  examination  required  the  patient  to  sit  in  front  of  the 
examiner  who  used  one  hand  for  holding  the  instrument  and  another  for  holding  the 
patient's  tongue.  This  equipment  was  also  used  to  take  photographs  of  the  larynx  because 


19 


it  used  electrical  bulbs  to  emit  sufficient  illumination  (Hahn  & Kitzing  1978).  Flexible 
fiberoptic  endoscopes  were  introduced  at  the  end  of  the  1960s  (Sawashima  & Hirose 
1981).  Fiberscopes  are  made  of  a bundle  of  flexible  fibers:  some  carry  the  light  to 
objects  being  examined  and  others  carry  the  image  back  to  the  viewer.  Fiberscopes  are 
usually  inserted  through  the  nasal  cavity  over  the  soft  palate,  passing  through  the 
oropharynx  and  hypopharynx.  They  can  reach  the  level  of  the  vocal  folds  because  of  their 
flexibility,  but  are  usually  placed  slightly  above  the  epiglottis.  They  are  equipped  with  a 
moveable  lens  tip  that  can  be  angled  and  rotated  to  view  the  larynx.  The  wide  angle  and 
zoom  lenses  provide  the  capability  to  view  the  laryngeal  structures  in  detail  without 
causing  the  patient  discomfort.  Because  of  this  flexibility,  these  endoscopes  can  also  be 
used  to  view  the  velopharyngeal  structures  in  addition  to  the  larynx.  In  general,  flexible 
endoscopes  have  the  advantage  of  allowing  a more  normal  posture  of  the  vocal  folds  and 
the  laryngeal  tract  than  is  allowed  by  the  rigid  endoscopes,  providing  the  capability  to 
evaluate  the  laryngeal  function  during  speech. 

Videolarvngostroboscopv 

Video  recording  the  motion  of  the  vocal  folds  is  called  videolaryngostroboscopy. 
Videolaryngeostroboscopy,  or  videostroboscopy  for  short,  is  the  current  and  most  widely 
used  method  for  documenting  laryngeal  function.  Stroboscopy  is  viewing  cyclical 
moving  objects  in  a way  that  makes  them  appear  stationary,  or  slow  moving.  It  is  based 
on  the  phenomena  of  optical  illusion.  The  human  retina  is  known  to  have  an  optical  lag 
period  that  gives  moving  objects  the  illusion  of  slow  motion.  When  an  object  is  presented 
to  a viewer  in  an  intermittent  manner,  it  appears  to  be  moving  slowly  or  standing  still. 
Stroboscopy  uses  that  phenomenon  (Alberti  1978,  Moore  1937).  The  retina  can  perceive 


20 


five  images  per  second  and  the  time  lag  of  the  image  on  the  retina  is  estimated  to  be  one 
fifth  to  one  seventh  of  a second  (Kallen  1932).  Videostroboscopy  uses  this  amount  of 
time  to  fuse  illuminated  points  of  the  vocal  folds  during  vibration,  providing  an  average 
vibratory  cycle  based  on  several  fused  vibratory  cycles  (Bless  et  al.  1987)  (See  Figure  1- 
6). 


Figure  1-6.  An  average  vocal  fold  vibratory  cycle  based  on  several  fused  vibratory 
cycles.  This  figure  is  a recreation  after  Colton  and  Casper  (1996). 

The  stroboscopic  phenomena  provides  valuable  input  as  to  the  function  of  the 
vocal  folds  because  the  vocal  folds  are  too  fast  for  the  human  eye  to  detect  the  details  of 
the  vocal  fold  movement  (Kitzing  1985).  Without  videostroboscopy,  it  would  not  be 
possible  to  view  the  composite  image  of  the  vibratory  cycle  and  subject  it  to  analysis. 

The  concept  of  retinal  lag  was  first  used  in  laryngoscopy  in  1878,  and  was  soon 
implemented  through  the  use  of  a rotating  disc  with  holes.  The  disk  was  initially  rotated 
by  hand,  but  produced  variable  rate.  The  instrument  was  improved  by  adding  a concave 
mirror  that  concentrated  the  light  to  one  point  through  the  holes  of  the  disk.  A telescope 


21 


was  added  to  increase  magnification  eight  times.  Other  improvements  of  the  instrument 
included  the  use  of  two  light  sources,  electric  light  sources,  condensing  lenses,  and  air- 
driven  and  electricity  driven  rotating  disks.  The  introduction  of  flashing  lights  shortly 
followed.  At  the  beginning,  flashing  lights  were  driven  by  electricity.  In  the  early  1900s, 
the  concept  of  using  sound  as  a regulating  force  for  the  flashing  light  was  introduced.  The 
sound  wave  stimulated  a transmitter  and  thus  set  a flow  of  current  that  caused  the  flashes 
of  light  to  be  synchronized  with  the  movements  of  the  sound  source.  Later  improvements 
included  increasing  light  intensity  using  two  light  tubes  and  increasing  the  light  flash 
speed  (Moore  1937).  The  current  stroboscopy  devices  (such  as  the  Kay  Elemetrics  RLS 
9100  Videostroboscope  System)  use  an  electronic  strobe  light  that  delivers  a high 
intensity  of  light  with  an  adjustable  light  duration  and  rate  of  discharge.  To  track  the 
frequency  of  the  vocal  folds,  a throat  microphone  is  put  on  the  skin  of  the  subject 
overlying  the  larynx  and  is  connected  to  a frequency  analyzer.  The  frequency  analyzer  is, 
in  turn,  connected  to  the  strobe  light,  which  is  triggered  in  synchrony  with  the  vocal  fold 
vibration.  The  frequency  of  illumination  can  be  controlled  manually  using  a foot  pedal  to 
achieve  slow  or  standstill  images  (Alberti  1978).  The  illumination  flashes  can  be  emitted 
either  in  synchrony  with  the  frequency  of  vocal  fold  vibration  resulting  in  a standstill 
image,  or  in  slight  variation  with  the  frequency  of  the  vocal  folds  vibration,  usually  ± 2 
Hz,  resulting  in  slow  motion  images.  A necessary  condition  to  achieve  the  stroboscopic 
effect  is  for  the  vocal  fold  vibration  to  be  periodic  or  quasi  periodic.  If  the  vocal  fold 
vibrations  are  too  aperiodic,  such  as  in  the  case  of  the  severely  hoarse  voice,  the  image 
may  not  be  clear.  In  aphonic  patients,  stroboscopy  is  of  limited  use,  because  there  is  no 
voicing  signal  to  track.  The  technique  also  requires  examiner  training  and  experience 


22 

because  its  interpretation  is  subjective,  relying  on  the  user's  knowledge  of  normal  and 
pathological  vocal  fold  physiology. 

A new  development  in  videostroboscopic  systems  is  the  Kay  Elemetrics  Digital 
Video  Recoding  System  (DVRS),  Model  9300.  This  system  has  superior  features  than  the 
Kay  RLS  Videostroboscopy  System,  Model  9100.  These  include  digitizing  the  video 
examination  directly  from  the  strobe  camera  to  the  processor  without  the  need  to  analog 
VHS  or  SVHS  videotape  recording.  It  has  better  endoscopy  and  stroboscopy  image 
quality,  fast  recall  and  playback  of  recorded  examinations,  and  multiple  image  display 
which  enables  comparison  of  examinations  (e.g.,  pre-  and  post-surgery).  Kay  Elemetrics 
claims  that  this  system  has  superior  hardware  with  no  motion  artifacts,  true  color 
representation  to  facilitate  the  assessment  of  inflammation)  and  reduced  noise  in  the 
signals,  which  allows  higher  video  compression  ratios  and  less  memory  to  store  the  video 
signal  (http://www.kayelemetrics.com/dvrs.htm).  These  features  need  to  be  thoroughly 
investigated  before  speech  pathologists  and  voice  scientists  accept  this  system  as  the  tool 
of  choice. 


Quantification  of  Endoscopic  Images 

The  contribution  of  empirical  studies  of  laryngeal  anatomy  comes  largely  from 
the  examination  of  excised  larynges,  with  numerous  anatomical  studies  documenting 
macroanatomical  dimensions  published  in  the  open  literature.  The  work  of  Gray  (1926), 
Hirano  et  al.  (1983),  and  Kahane  (1981)  provide  some  of  the  archetypal  anatomical 
information  on  laryngeal  structure.  This  data  is  currently  used  to  interpret  functional 
phonatory  distinctions  found  with  development,  aging  and  dysfunction  (Stathopoulos  & 
Sapienza  1997,  Titze  1989,  1994).  While  measurement  of  morphological  information 


23 


from  excised  larynges  offers  the  advantage  of  accessibility  for  measuring  laryngeal 
dimensions,  it  has  less  value  for  determining  measurements  from  large  numbers  of 
individuals  and  limited  utility  for  determining  laryngeal  dimensions  during  breathing  and 
voicing.  Additionally,  anatomical  detail  of  laryngeal  pathologies  cannot  be  defined. 
Moreover,  methodological  artifact  associated  with  tissue  death  and  its  preservation  after 
excision  makes  measurements  of  laryngeal  anatomy  from  cadaveric  specimens 
susceptible  to  calculation  error.  Other  artifacts  associated  with  working  with  cadaveric 
specimens  include  the  embedding  material  used  to  preserve  the  structure.  The  use  of 
paraffin  or  celloidin  to  embed  larynges  almost  invariably  creates  soft  tissue  damage 
because  of  the  need  for  decalcification  before  preparing  the  specimen.  Also,  tissue 
shrinkage  usually  occurs.  In  addition,  these  techniques  often  result  in  a loss  of  the  natural 
color  of  the  tissue,  prohibiting  accurate  analysis  of  pathological  details.  With  plastination 
of  excised  larynges,  a fairly  new  preservation  technique,  the  above-mentioned  artifacts 
are  lessened.  However,  because  the  tissue  being  studied  is  still  from  an  excised  structure, 
examining  the  genuine  inherent  mechanical  properties  of  the  system  is  impossible. 
Finally,  with  any  preservation  technique  the  time  involved  in  processing  the  tissue  is  too 
costly  for  examining  large  numbers  of  specimens,  excision  usually  needs  to  take  place 
immediately  after  death  and  repeated  measurements  are  not  possible  at  different  times 
(Eckel  & Sittel  1995).  For  example,  plastination  takes  from  five  to  eight  weeks  to 
preserve  the  tissue  making  it  less  than  optimum  for  research  purposes  and  better  suited 
for  instruction  and  the  study  of  a small  number  of  samples  (Eckel  et  al.  1993). 

Measuring  laryngeal  anatomy  in  vivo  has  interesting  implications  as  a method  for 
quantifying  vocal  fold  structure  as  well  as  other  anatomical  forms  involved  during 


24 


breathing  and  voice  production  (Woo  1996).  As  stated  previously,  the  current  and  most 
widely  used  method  for  documenting  the  status  of  the  laryngeal  structure  is  endoscopy, 
with  video  recording  of  digital  imagery  for  archival  purposes.  Unfortunately,  the  in  vivo 
determination  of  laryngeal  anatomy  and  dimensions  has  been  seriously  fraught  with 
difficulties  involving  image  distortion  from  the  endoscopic  technique.  Persistent 
questions  remain  concerning  difficulties  in  obtaining  measurable  objective  data  from 
endoscopy,  particularly  with  flexible  transnasal  endoscopy.  With  transnasal  endoscopy, 
distortions  of  the  video  image  exist  from  wide-angle  endoscopic  lenses,  whose  field  of 
view  ranges  from  60  to  95  degrees.  Casper  et  al.  (1988)  viewed  objects  through  a 
transnasal  endoscope  to  determine  the  extent  of  image  distortion.  They  examined  both 
static  structures  and  objects  manipulated  dynamically.  Their  study  showed  the  existence 
of  (a)  radial  distortion,  where  static  structures  appear  progressively  smaller  toward  the 
image  periphery  and  (b)  distance  or  caial  distortion,  whereby  objects  proximal  to  the 
endoscope  appear  larger  and  more  distorted  than  distant  objects.  The  greatest  limitation 
thus  far  reported  with  transnasal  endoscopy  is  the  lack  of  full  view  of  the  laryngeal  image 
during  dynamic  tasks.  For  example,  the  anterior  commissure  often  cannot  be  visualized, 
prohibiting  measurement  of  features  such  as  total  vocal  fold  length. 

A rigid  scope  offers  diminished  distance  distortion  when  compared  to  a flexible 
scope  because  of  its  smaller  vertical  movements  and  can  often  obtain  a complete  image 
of  the  larynx  during  dynamic  tasks.  Unfortunately,  radial  distortion  still  occurs,  but  has 
been  quantified  for  selected  non-laryngeal  endoscopes  (Asari  et  al.  1999).  However, 
currently  published  theory  and  analysis  does  not  adequately  quantify  the  effect  of 


25 


transient  distortions  (e.g.,  mucoid  secretions  on  the  endoscope  lens,  or  tissue  movement 
within  the  field  of  view)  or  dimensional  artifacts  involved  in  3-D  imaging  and  analysis  of 
laryngeal  structures. 

Given  that  the  analysis  of  endoscopic  images  is  fairly  new  (see  Third  International 
Workshop,  Advances  in  Quantitative  Laryngoscopy,  Germany,  June  19-20,  1998 
(http://www.klinikum.rwth-aachen.de/webpages/mib/mbv/larv98/):  Digital  Quantitative 
Laryngoscopy,  last  updated  02.07.99  (http://  http://www.klinikum.rwth-aachen.de/ 
/mib/mbv  /proiects/laryngoskopie/QV  uk.html):  Image  Processing  for  the  Medical 
Sciences,  last  updated  on  02-12-1998  (http://www.klinikum.rwth-aachen.de/webpages 
uk/  mib/mbv/bvm98/index_e.html)  and  the  measurement  artifacts  are  still  largely 
unresolved,  development  of  a method  which  increases  analytical  accuracy  and 
experimentally  quantifies  optical  distortion  with  regard  to  image  size  and  color  is  of 
increased  interest  empirically  and  theoretically.  Present  methodological  discussion 
focuses  on  such  topics  as  color  and  illumination  distortion,  and  calibration  issues  relevant 
to  measuring  areas  and  lengths  from  videotaped  samples  (Hassan  et  al.  1998,  Palm  et  al. 
1998). 

In  1 990,  an  international  group  of  scientists  developed  a protocol  outlining 
imaging  standards  for  videofluoroscopy  and  endoscopy.  While  these  standards  were 
considered  for  viewing  the  upper  vocal  tract,  the  artifact  reported  with  these  methods 
applies  to  laryngeal  imaging.  The  following  issues  were  brought  forth  by  the  group 
(Goulding-Kushner  et  al.  1990,  p.  338): 

1 . Standardization  should  be  based  on  a ratiometric  rather  than  absolute 
measurement; 

2.  The  placement  of  referents  of  known  size  or  color  within  the  endoscopic  field  of 
view  is  a difficult  goal  that  may  not  be  relevant  clinically  and  should  be  explored; 


26 


3.  Proximity  to  the  object  being  measured  endoscopically  is  not  constant  and  needs 
consideration; 

4.  Angle  of  the  scope  tip  from  the  image  can  create  angle  distortion  and  this 
distortion  should  be  quantified; 

5.  All  endoscopic  observations  should  be  based  on  the  views  where  all  or  most  of 
the  structure  can  be  seen  in  a single  field  of  view. 

The  catalyst  that  has  driven  much  of  the  research  with  regard  to  laryngeal  image 
analysis  has  been  documentation  of  changes  in  the  laryngeal  structure  as  a function  of 
intervention.  Investigations  that  have  analyzed  the  laryngeal  structures  with  image 
analysis  are  available  in  the  open  literature.  Scientists  have  used  various  techniques  and 
measurement  strategies  to  define  laryngeal  dimensions.  General  methodology  common  to 
most  studies  indicates  that  subjects  sustain  the  vowel  /i/  at  a habitual  pitch  and  loudness 
during  oral  rigid  endoscopic  examination.  Habitual  pitch  and  loudness  is  used  because 
distance  measurements  are  influenced  by  vocal  fold  length,  which  varies  with  changes  in 
pitch  and  loudness  (Titze  1993).  Laryngostroboscopic  images  are  recorded  at  30 
frames/second.  The  recorded  sequential  images  of  the  vocal  fold  vibration  are  played 
back  frame-by-frame  and  digitized  on  a microcomputer. 

In  the  area  of  speech  language  pathology,  inexpensive  quantification  of 
endoscopic  images  has  been  promoted  as  sufficient  for  quantifying  laryngeal  dimensions 
(Gonsalves  & Leonard  1998).  The  methods  call  for  inexpensive,  commercial  off-the- 
shelf  digital  imaging  processing  hardware,  as  well  as  software  such  as  Adobe  Photoshop 
and  Gimp.  Unfortunately,  such  commercial  resources  are  limited  by  the  scope  of  their 
respective  designs,  which  often  do  not  include  types  of  processing  required  for  medical 
research  and  rigorous  clinical  diagnostic  practice.  For  example,  Adobe  Photoshop  does 
not  support  automatic  or  semi-automatic  image  processing  and  analysis  algorithms. 
Another  imaging  software  program  that  has  been  used  to  assess  laryngeal  dimensions  has 


27 


been  the  NIH  Image  software,  which  appears  to  be  commonly  used  in  medical  research. 
For  example,  Omori  et  al.  (1996,  1998)  used  the  NIH  Image  program  to  define  the  size  of 
the  glottal  gap,  which  is  the  area  between  the  medial  margins  of  the  vocal  folds.  This 
measurement  was  done  on  the  basis  of  differences  between  the  color  density  of  the  vocal 
fold  and  the  glottal  gap  at  the  maximum  closed  point  of  vibration.  The  manually  traced 
glottal  gap  was  calculated  in  units  of  pixels,  which  were  assumed  to  be  square,  for 
convenience.  Membranous  vocal  fold  length  (MVFL)  was  measured  from  the  point  of  the 
anterior  commissure  to  the  point  of  the  tip  of  the  vocal  process  and  was  expressed  in  units 
of  pixels.  Glottal-gap  area  A was  normalized  as  follows: 

A = A/(MVFL)^x  100  units 

Maximum  glottal-gap  width  W and  posterior  glottal  gap  width  was  measured  at  the 
maximum  closed  point  in  the  vibratory  cycle.  W,  defined  as  the  distance  between  the  free 
edges  of  the  membranous  vocal  folds  at  the  point  of  maximum  width,  was  expressed  in 
units  of  pixels,  and  was  normalized  by  MVFL  to  yield: 

W = W/MVFLx  100  units 

Posterior  glottal-gap  width  was  measured  as  the  distance  between  the  tips  of  the  vocal 
processes,  expressed  in  units  of  pixels  and  normalized  by  MVFL.  When  a gap  exists  in 
the  posterior  aspect  of  the  glottis,  maximum  glottal-gap  width  W and  posterior  glottal- 
gap  width  P are  similarly  normalized  to  yield: 

P = P/MVFLx  100  units 

Computation  of  A,  W,  and  P using  the  NIH  Image  software  was  shown  by  Omori  et  al., 
based  on  manual  fitting  of  boundaries  to  the  glottal  gap  region.  This  method,  while  useful 
as  an  introductory  technique,  may  lack  consistency  and  accuracy.  Consistency  may  be 


28 


absent  because  of  a lack  of  rigorous  area  segmentation  and  boundary-finding  procedures. 
For  example,  when  landmarks  or  reference  points  of  laryngeal  structures  are  located 
manually,  results  depend  significantly  on  operator  bias  that  can  be  influenced  by  visual 
acuity,  extent  of  training,  state  of  health  or  fatigue,  environmental  factors  such  as  ambient 
lighting,  as  well  as  systematic  issues  such  as  resolution  or  contrast  of  a computer  display 
or  printout.  Lack  of  reproducibility  can  directly  result  from  manual  task  performance. 
Additionally,  a crucial  problem  that  has  not  been  reported  in  the  literature  is  the  gradient 
nature  of  the  imaged  boundary  between  the  vocal  fold  and  the  glottal  gap  area.  The 
choice  of  boundary  points  can  strongly  influence  the  automated  image-based  computation 
of  glottal  gap  area. 

One  of  the  first  to  discuss  quantification  of  videostroboscopic  images  from  a large 
pool  of  subjects  with  normal  vocal  function  was  Woo  (1996).  He  systematically  observed 
changes  in  videostroboscopic  images  in  100  males  and  females  with  normal  voice  as 
pitch  and  loudness  were  varied.  By  plotting  and  analyzing  glottal  area  waveforms  from 
frame-by-frame  analysis  of  the  glottal  cycle  he  was  able  to  report  on  the  normal 
variations  of  the  vocal  fold  structure  as  it  is  related  to  sex  and  vocal  function.  Both 
flexible  and  rigid  endoscopy  were  used,  although  the  rigid  endoscopy  was  preferred 
because  of  the  superior  image  quality.  Image  capturing  and  analysis  was  done  with  a 
personal  computer  (PC)  outfitted  with  a video  digitizer  board.  The  vocal  fold  area  of 
interest  was  defined  and  clipped  for  digital  image  manipulation  using  commercially 
available  image  analysis  software  {Bioscan,  Optimas  Corp.,  Edmons,  Washington).  For 
each  image,  the  vocal  folds,  arytenoid  complex,  and  epiglottis  were  defined  as  areas  of 
interest.  The  defined  image  was  adjusted  for  contrast  and  brightness,  which  influenced 


29 


subsequent  image  digitization.  To  enhance  the  image  for  automated  glottal  area  tracing, 
histogram  equalization  was  performed  on  the  image  using  uniform  histogram  distribution 
function.  According  to  Woo,  this  function  produces  a uniformly  distributed  histogram 
pixel  intensity  thereby  increasing  image  contrast  to  facilitate  line  and  edge  detection. 
Glottal  area  was  automatically  traced.  This  selection  of  glottal  area  and  not  glottal  width 
was  chosen  because  the  midpoint  of  each  vocal  fold’s  vibratory  margin  is  not  accurately 
defined  in  many  cases.  To  trace  the  glottal  gap  boundaries,  the  computer  cursor  was 
placed  on  the  glottal  margin  and  automatic  tracing  of  the  glottal  margin  was  done  using  a 
luminescence  shift  gradient  method,  which  is  used  by  Bioscan  for  edge  detection.  The 
luminescence  gradient  shift  can  be  set  to  detect  areas  of  maximal  changes  in  light  to  dark, 
thereby  detecting  the  vocal  fold  edge  (P.  Woo,  Personal  correspondence,  September  6, 
2000).  Area  tracing  was  verified  by  visual  inspection,  and  then  glottal  area  was  measured 
by  counting  pixels  within  the  traced  area. 

As  stated  previously,  because  laryngeal  images  vary  in  size  depending  on  the 
distance  of  the  endoscope  from  the  vocal  folds,  comparison  of  glottal  images  and  areas 
across  subjects  is  not  possible  without  internally  consistent  calibration.  Woo  (1996)  thus 
selected  internal  laryngeal  landmarks  of  the  anterior  commissure  and  the  vocal  process  as 
reference  points  for  each  subject.  These  points  were  manually  selected  using  a computer's 
pointing  device  and  associated  cursor,  to  define  the  glottal  length.  To  normalize  the 
glottal  area  among  samples,  the  absolute  glottal  area  in  pixels  was  divided  by  glottal 
length  and  was  defined  as  the  normalized  glottal  area.  Technical  difficulties  in  Woo’s 
laryngoscopic  technique  included  error  because  of  light  and  shadow  effects  and  problems 
of  mucus  stranding.  Unfortunately,  the  previously-discussed  errors  of  manual  location  or 


30 

tracing  with  subsequent  analytical  errors,  as  well  as  the  problems  of  a multi-pixel 
boundary  gradient,  adversely  impacted  Woo’s  work  as  well  as  Omori  et  al’s  research. 

In  another  study,  Inagi  et  al.  (1997)  sought  to  determine  if  there  was  any 
correlation  between  traditional  horizontal  position  classification  of  non-functioning  vocal 
folds  and  vocal  function.  Samples  were  collected  during  modal  phonation,  maximum 
voluntary  inspiration  and  rest  breathing  in  those  with  vocal  fold  paralysis.  Three  glottal 
stroboscopic  frames  were  obtained:  full  adduction  during  modal  phonation,  full  abduction 
during  maximum  voluntary  inspiration  and  at  a resting  position.  Four  reference  points 
were  determined  for  each  glottal  frame:  anterior  commissure,  the  vocal  process  of  the 
paralyzed  side,  the  vocal  process  of  the  normal  side  and  midarytenoid  point. 

Midarytenoid  point  was  determined  after  reviewing  the  vocal  fold  movement  patterns  so 
as  not  to  be  biased  by  position  of  the  paralyzed  fold.  The  maximum  medial  movement  of 
the  normal  vocal  fold  during  modal  phonation  was  used  to  identify  the  midline.  A marker 
was  placed  at  the  site  of  the  normal  vocal  process  (during  maximum  adduction)  and  a 
horizontal  line  was  extended  posteriorly  to  meet  the  posterior  glottis.  These  reference 
points  were  saved  on  the  digitized  image  to  facilitate  standardization  of  measurements  of 
the  glottal  area,  length  and  angle.  Findings  indicated  that  vocal  function  was  not  only 
affected  by  the  vocal  fold  deviation  from  the  midline  but  also  by  the  shape  of  the 
subglottis,  the  glottic  plane  of  closure,  and  bowing  of  the  vocal  folds.  Inagi  et  al.  (1997) 
also  suggested  that  visual  observation  of  the  bowing  of  the  vocal  folds  is  not  exclusive  in 
determining  vocal  function  pre  and  post  thyroplasty.  They  added  that  obtaining 
endoscopic  images  of  the  folds  while  the  subject  is  breathing  could  reduce  error  in 
measurement  because  the  vocal  folds  would  be  less  affected  by  other  factors,  such  as 


31 

vocal  fold  stretching  during  phonation.  During  phonation,  the  paralyzed  vocal  fold  is 
stretched  longitudinally  as  a result  of  force  exerted  on  it  by  the  normal  vocal  fold. 

Finally,  digital  image  analysis  of  videotaped  fluoroscopic  images  objectifying  the 
measurement  of  pharyngeal  area,  hyoid  displacement,  esophageal  sphincter  opening  as 
well  as  tongue  and  jaw  motion  has  been  performed  in  support  of  studies  that  address  the 
complexity  of  the  swallowing  process.  This  type  of  analysis  has  also  been  applied  to  the 
definition  of  lip  excursion,  as  well  as  lip  and  jaw  motion  for  defining  ehanges  in  speech 
intelligibility  (Turner  & Williams  1991).  So  the  use  of  digital  image  analysis  has  had,  and 
continues  to  have,  an  important  role  in  the  quantifieation  of  anatomical  structures, 
particularly  as  they  relate  to  normal  development  and  the  impact  of  disease  processes. 

Statement  of  the  Problem 


Justifieation 

Accurate  quantification  of  laryngeal  findings  is  a topic  that  both  speech-language 
pathologists  and  voice  scientists  are  starting  to  pursue,  to  deerease  dependence  on 
subjective  measurements  of  the  voice.  The  review  of  studies  presented  above  provides 
examples  from  a growing  body  of  literature  that  has  foeused  on  this  specific  topic.  A 
number  of  eommercial  image  quantification  methods  are  available  which  attempt  to 
provide  eost  effective  methods  for  quantifying  laryngeal  images.  Unfortunately,  it 
appears  that  the  proposed  techniques  may  have  inherent  software  and  methodological 
disadvantages.  One  main  software  problem  is  the  lack  of  automated  glottal  area  tracing. 
The  visual  determination  of  laryngeal  landmarks  and  laryngeal  border  identification  is 
also  a prevalent  problem.  As  stated  earlier,  problems  inherent  in  endoscopic  / 


32 

videostroboscopic  techniques,  such  radial,  axial,  color  and  display  distortion,  are  also 
present  (Casper  et  al.  1988). 

The  software  packages  used  in  this  study  support  a wide  variety  of  image 
processing  and  enhancement  operations.  Commercial  or  freely  distributed  packages  such 
as  Adobe  Photoshop,  Gimp,  and  Scion  Image's,  capability  to  produce  accurate  and 
consistent  laryngeal  structures  measurements  is  yet  to  be  tested.  This  study  is  an  attempt 
to  do  so,  in  order  to  provide  information  about  their  applicability,  if  any,  to  clinical 
practice. 

Pilot  Study 

Investigation  of  state-of-the-art  laryngeal  image  acquisition  and  enhancement 
was  conducted  as  a multidisciplinary  effort  among  the  Department  of  Communication 
Sciences  and  Disorders,  the  Department  of  Computer  and  Information  Science  and 
Engineering,  and  the  Department  of  Otolaryngology  at  the  University  of  Florida.  The 
initial  investigation  set  out  to  test  the  effects  of  endoscope  distortions  on  image  analysis 
regardless  of  the  software  used.  A set  of  test  objects  that  included  a plastic  laryngeal 
model,  excised  male  larynx  (for  realism),  graph  paper  (to  estimate  barrel  distortion)  and 
commercial  color  chart  (to  measure  color  discrimination)  were  used.  An  embalmed 
larynx  was  obtained  from  the  Anatomical  Board  in  the  School  of  Medicine  at  the 
University  of  Florida.  The  embalmed  larynx  was  prepared  by  the  Anatomical  Board  as 
follows:  the  hyoid  bone  was  removed  and  the  external  supra  and  infra  hyoid  muscles 
were  removed.  Further  necessary  excisions  were  performed  in  the  Laryngeal  Function 
Laboratory  at  the  Department  of  Communication  Sciences  and  Disorders.  The  vocal  folds 
were  exposed  by  removing  the  following  structures:  the  thyrohyoid  muscle,  the 


33 


thyrohyoid  membrane,  the  posterior  part  of  the  eonus  elasticus,  and  the  epiglottis  along 
with  the  aryepiglottie  museles  extending  from  the  sides  of  the  epiglottis  to  the  apexes  of 
the  arytenoid  cartilages.  The  excised  larynx  was  taken  to  the  Ear,  Nose  and  Throat  Clinic 
at  Shands  Hospital  to  obtain  endoscopic  images.  To  achieve  stability,  the  larynx  was 
fixed  on  a paraffin  wax  board  using  metal  pins. 

Sixty-three  320  x 240-pixel  images  of  the  test  objects  were  obtained  with  a Kay 
RLS-9100  Rhino-Laryngeal  Stroboscope  (imaging  endoscope)  (Kay  Elemetrics,  Lincoln 
Park,  NJ)  in  a clinical  setting.  Several  example  laryngeal  images  representative  of  this 
dataset  are  shown  in  Figures  1-7  and  1-8.  The  approximate  vertical  distances  of  15  mm 
between  the  vocal  folds  and  the  tip  of  the  lenses  of  the  flexible  endoscope  and  70  mm 
between  the  vocal  folds  and  the  tip  of  the  lenses  of  the  rigid  endoscope  were  used.  These 
vertical  distances  were  determined  by  subjective  judgment  of  the  typical  distance 
between  each  endoscope  and  the  vocal  folds.  The  images  were  not  enhanced  following 
digitization,  to  preserve  input  data.  Standard  methods  of  color  segmentation  (Schmalz,  & 
Caimi  1996)  were  applied  to  yield  spatial  maps,  an  example  of  which  is  shown  in  Figure 
l-8b.  From  these  segmentation  maps,  parameters  such  as  Mean  Vocal  Fold  Length  and 
Glottal  Gap  Area  (GGA)  were  obtained  by  standard  methods  (i.e.,  morphological 
processing).  Morphological  processing  is  a technique  that  uses  erosion  and  dilation  of 
pixels.  It  can  be  used  for  signal  enhancement  or  suppression  and  edge  detection. 
Morphological  filters  have  been  used  to  process  biomedical  images  in  the  study  of 
Alzheimer's  disease  and  cytology  (Swett  et  al.  1997). 

Manual  tracing  of  the  inner  and  outer  boundaries  of  the  glottal  gap  was  also  used 
to  validate  GGA,  after  Omori  et  al.  (1998),  which  yielded  errors  ranging  from  1.3  % to 


34 


2.8  % of  total  area  (in  pixels).  This  preliminary  study  has  augmented  existing  reports  in 
the  literature  to  show  that  key  problems  with  current  techniques  of  automated  laryngeal 
image  analysis  are  threefold.  First,  the  spatially  nonuniform  endoscopic  response  function 
produces  contrast  reduction  in  the  image  periphery  (bottom  of  Figure  l-7a)  as  well  as 
spatial  distortion,  which  is  shown  in  Figure  1-10.  Second,  because  of  a lack  of  empirical 
or  analytical  modeling  of  the  vocal  fold  cross-sectional  profile  and  its  effect  upon  image 
contrast  or  color  consistency,  the  glottal  gap  boundary  is  inaccurately  established.  For 
example,  a change  in  three  pixels  (half-maximum  of  Figure  l-7a  model  larynx  vocal  fold 
boundary)  yields  a change  of  509  pixels  of  glottal  gap  area  computed  from  Figure  l-7a. 


Figure  1-7.  Laryngeal  images  (a)  plastic  model,  image  obtained  using  an  oral  rigid 
endoscope;  (b)  excised  male  larynx,  image  obtained  using  an  oral  rigid 
endoscope;  and  (c)  excised  male  larynx,  image  obtained  using  a flexible 
transnasal  endoscope. 


This  represents  a completely  unacceptable  error  of  40.7  % in  the  glottal  gap  area 
of  1251  ± 31  pixels  measured  from  the  down  looking  view  shown  in  Figure  l-7a.  The 
tolerance  of  3 1 pixels  is  the  mean  difference  in  glottal  gap  area  among  five  replicate 
measurements  taken  from  Figure  l-7a. 

When  imaging  the  male  larynx  using  a direct  down  looking  view  (rigid  endoscope 
head),  an  error  of  455  pixels  in  a nominal  glottal  gap  area  of  2574  ± 32  pixels  was 


35 


measured  from  the  inner  boundary  of  the  glottal  gap,  which  represents  an  error  of  17.6  %. 
The  difference  in  measurement  error  between  the  plastic  model  and  the  male  larynx  is 
partially  due  to  the  coloration  and  slope  of  the  vocal  folds  in  the  plastic  model,  and  is 


(a)  (b)  (c)  (d) 

Figure  1-8.  Glottal  gap  segmentation  in  female  subject  with  vocal  fold  edema  and  small 
vocal  fold  nodules:  (a)  source  image,  (b)  spectral-based  segmentation  of  (a), 
(c)  morphological  processing  applied  to  (b),  (d)  segmentation  of 
morphologically  processed  glottal  gap. 


presented  to  show  the  effect  of  vocal  fold  cross-sectional  profile  and  color  on  the  GGA 
measurement.  For  example,  it  was  found  that  spectral  content  (color)  between  left,  center, 
right,  and  downward-tilted  viewing  positions  varied  as  much  as  25.8  % with  respect  to  in- 
band  mean  values  (e.g.,  on  the  plastic  model  larynx,  the  mean  gray  value  for  the  value 
(intensity  or  brightness)  band  varied  from  147/  255  in  the  down  looking  view  to  109  / 255 
in  the  centered  view).  In  the  excised  male  larynx,  color  constancy  improved,  with  error 
between  views  ranging  from  0.6  % to  7.1  %. 

As  shown  in  Figure  1-9,  the  dominant  spectral  band  in  the  RGB  image  is  red, 
which  is  to  be  expected  in  vascularized  tissue.  In  order  to  test  color  constancy  with  a 
calibrated  sample,  a commercial  paint  chart  with  two  adjacent  colors  (Sherwin-Williams 
282A-4  and  -3)  that  were  just  distinguishable  to  three  human  observers  with  normal 
color  vision  was  used.  For  example,  means  in  the  value  and  red  bands  ranged  from  168  ± 


36 


9 out  of  a possible  255  to  174  ± 10  in  the  centered  view  for  the  282A-3  color  and  ranged 
from  229  ± 9 to  234  ± 8 for  the  282A-4  color.  Because  174  + 2 (10)  (two  standard 
deviations)  = 194  < 21 1 = 229  - 2 (9),  it  was  claimed  that  the  endoscope,  digitization 
system,  and  image  analysis  program  (custom  software  from  UF  Center  for  Computer 
Vision  and  Visualization)  distinguished  between  samples  at  a level  of  2 (two  standard 
deviations).  Thus,  the  differences  in  spectral  response  shown  in  Figures  l-9a-c  are  not 
because  of  noise  or  lack  of  color  constancy  within  the  endoscope  hardware,  but  are  view 
and  specimen  dependent.  In  laryngeal  imaging,  such  errors  have  causes  that  are  not  well 
understood,  and  therefore  must  be  quantified,  modeled,  and  analyzed,  in  order  to  bound 
(define)  reliability  of  image-based  endoscopic  measurements. 

A third  problem  with  current  laryngeal  imaging  practice  is  the  lack  of  a 
prespecified  or  standardized  clinical  procedure,  which  would  reduce  distortion  and 
resultant  feature  misrepresentation.  For  example,  as  shown  in  Figure  1-lOa,  when 
the  endoscope  is  tilted  laterally  at  an  angle  ranging  from  zero  to  ± 8.6  degrees  with 
respect  to  the  vocal  fold  plane,  the  nominal  glottal  gap  area  (determined  from  the 
data  of  Figure  1-lOa)  varies  by  1 104  ± 22  pixels  to  1251±  31  pixels  between  the 
left,  center,  right,  and  down-tilted  views  on  the  plastic  model  rigid  endoscope  held 
139+2  mm  from  the  glottis,  which  represents  an  error  of  ± 1 1 %.  A similar  error  is 
encountered  when  imaging  the  male  larynx  with  either  the  rigid  or  flexible 
endoscopes  at  distances  of  104  ± 2 mm  and  65  ± 2 mm,  respectively.  When  the 
effect  of  boundary  width  uncertainty  (varied  from  one  to  five  pixels)  is  factored  in, 
an  additional  measurement  error  ranging  from  1 6 % to  45  % of  GGA  was  obtained. 

This  is  shown  in  Figure  1-lOa. 


37 


— LJ- "-J U I l_l 

Left  Center  Right  Down  Left  Center  Right  Down 


(a) 


(b) 


' I * L ■ I I I I I t I I I OU  I 1 

Left  Center  Right  Down  282A3  282A4 


(c)  (d) 

Figure  1-9.  Pilot  analysis  of  preliminary  database  imagery  to  determine  color  constancy 
(V  = value,  R = red,  G ==  green,  B = blue)  between  views  obtained  by  left, 
center,  right,  and  downward  endoscope  inclination  (a)  plastic  model  of  larynx 
imaged  with  oral  rigid  endoscope;  (b)  excised  male  larynx  imaged  with  oral 
rigid  endoscope;  (c)  plastic  model  imaged  with  flexible  endoscope  head;  (d) 
color  chart  imaged  with  centered  oral  rigid  endoscope  head.  Here,  g denotes 
the  gray  level  of  each  V,  R,  G,  and  B band  and  error  bars  represent  one 
standard  deviation. 


A well-known  problem  with  endoscopic  measurements  is  barrel  distortion. 
This  was  more  severe  in  the  flexible  endoscope  head,  which  has  a shorter  focal 
length  and  viewing  distance,  and  thus  produced  greater  curvature-of-field  effects. 


38 

In  this  preliminary  study,  barrel  distortion  was  estimated  by  imaging  a pieee 
of  millimeter-ruled  graph  paper  at  standard  viewing  distanee  (104  ± 2 mm  and  65  ± 

2 mm,  as  specified  previously).  Deviation  from  horizontal  or  vertical  of  the  ruled 
lines  was  measured  by  first  thinning  the  graph  paper  lines  to  their  medial  axis  of 
width  one  pixel,  then  measuring  deviation  from  the  best  Bresenham  line  tangent  to 
the  medial  axis.  The  Bresenham  line  is  based  on  the  Bresenham  algorithm,  which 
is  commonly  used  to  define  lines  on  very  low  resolution  graphics  on  computer 
terminals  or  printers  (Baden  1983).  The  resultant  deviation,  expressed  in  pixels,  is 
shown  in  Figure  1-lOb  for  left,  center,  right,  and  downward-tilted  viewing  with  the 
rigid  endoscope  head.  The  flexible  endoscope  head  did  not  yield  sufficient  contrast 
in  the  received  image  to  support  accurate  error  measurement  at  spatial  errors  less 
than  ± 1 .2  mm,  and  was  thus  not  analyzed  further. 

The  preceding  errors,  taken  individually  or  in  combination  of  absolute  error, 
yield  large  errors  of  estimation  in  laryngeal  parameters  such  as  glottal  gap  area. 

However,  consistency  of  measurements,  despite  the  problems  discussed  in  this  pilot 
study,  remains  unexplored.  If  one  or  more  of  the  software  packages  examined  in 
this  study  prove  to  yield  consistent  results,  even  if  those  measurements  were 
relative,  they  would  be  clinically  useful.  For  example,  measuring  the  pre  versus 
post  thyroplasty  change  in  glottal  gap  area  in  a unilateral  adductor  vocal  fold 
paralysis  case  would  be  meaningful  even  if  it  only  reflects  relative  measurements 
that  result  from  multiple  trials  on  the  same  subject. 

Being  able  to  document  the  change  in  GGA  provides  the  surgeon  and  the 
treatment  team  with  objective  evidence  that  the  procedure  has  successfully  closed 


39 


the  glottal  gap.  It  also  provides  speech  language  pathologists  with  the  ability  to 
document  treatment  outcomes  and/or  the  effectiveness  of  therapy  strategies  in 
behaviorally  modifying  the  laryngeal  dimensions.  Data  such  as  this  is  meaningful 
for  documenting  the  functional  outcomes  from  a patient  perspective,  medical 
perspective,  and  third  party  payer  perspective.  Important  prerequisites  are  inter- 
program and  inter-analyst  consistency.  Without  establishing  consistency,  the 
generated  data  would  probably  not  be  clinically  stable. 


3000 
2500 
A 2000 
1500 
1000 


•'T •- 


Male  Larynx 


-r-«— -4-1 


Plastic  Model 


• ,0  = Inner  Boundary 
■ , □ = Inner  Boundary 
with  Curvature  Effect 


C>-+— . 


o-r 


o 


-4— D- 1 _□ 


Left 


Center 


Right 


Down 


3- 


J ^ L 

mm-Grid,  Rigid  Scope 
0=  Vertical  ^ = Horizontal 
■=  Central  ■ = Peripheral 


2- 


I 

''i- 


I 

I 


0 _L^ — # I o — ♦ I o 


Left 


Center  Right  Down 


(a)  (b) 

Figure  1-10.  Effect  of  error  in  (a)  glottal  gap  area  (A,  in  pixels)  measurement  using  inner 
boundary  with  and  without  effect  of  vocal  fold  curvature;  and  (b)  rigid 
endoscope  barrel  distortion  (expressed  in  pixels)  measured  vertically  and 
horizontally  with  respect  to  a millimeter-ruled  grid  imaged  at  320  x 240- 
pixel  resolution  (spatial  sampling  error  = ± 0.1  mm,  repeatability  = 0.3  mm 
over  five  replicates). 


Purpose 

This  project  is  a first  step  towards  developing  imaging  software  that  will 
investigate  and  address  the  above-discussed  limitations.  This  project  set  out  to  test  inter- 
program and  inter-analyst  consistency  of  three  commercially  available  image  processing 
and  analysis  software  packages:  Adobe  Photoshop  5.5,  Gimp  version  0.99.x/1.0  and  Scion 


40 


Image  Beta  4.2.  The  capabilities  of  these  programs  for  the  identification  and  calculation 
of  glottal  gap  area  and  total  vocal  fold  length  in  a selected  group  of  endoscopic  images 
was  explored.  This  task  was  accomplished  using  a collection  of  image  frames  of  a normal 
subject  and  a patient  with  unilateral  adductor  vocal  fold  paralysis.  Scion  Image  (the  PC 
version  NIH  Image  software)  was  chosen  for  comparison  with  the  other  two 
commercially  available  software  packages  because  it  is  the  most  common  package  used 
by  speech-language  pathologists  and  voice  scientists  in  laryngeal  image  analysis. 
Examples  of  studies  based  on  NIH  Image  are  Gonfalves  and  Leonard  (1998),  Omori  et 
al.  (1996,  1998),  and  Inagi  et  al.  (1997).  All  three  software  packages  have  similar  image 
processing  and  analysis  features.  Laryngeal  feature  measurement  was  initially  based  on 
previously  cited  research,  particularly  Omori  et  al.  (1996,  1998). 

The  dependent  variables  measured  included  the  glottal  gap  area  and  total  vocal 
fold  length.  Glottal  gap  area  (GGA)  was  defined  as  the  area  between  the  medial  margins 
of  the  vocal  folds.  Total  vocal  fold  length  (TVFL)  was  defined  as  the  distance  between 
the  point  of  the  anterior  commissure  to  the  point  of  the  tip  of  the  vocal  process.  The 
following  hypotheses  were  made: 

1.  There  will  be  no  significant  inter-program  differences  or  variability  in  the 
measurement  of  GGA  and  TVFL  for  the  normal  and  pathological  larynx. 

2.  There  will  be  no  significant  inter-analyst  differences  or  variability  for  GGA  and 
TVFL  within  each  of  the  three  programs  for  both  the  normal  and  pathological 
larynx. 


CHAPTER  2 
METHODOLOGY 


This  study  examined  three  imaging  software  paekages  for  their  use  in  image 
analysis.  Specifieally,  Adobe  Photoshop  5.5  (Adobe  Systems  Ine.,  CA),  Gimp  0.99.x/ 1.0 
(Image  Manipulation  Program,  Experimental  Computing  Facility,  University  of 
California  at  Berkley)  and  Scion  Image  Beta  4.2  (Scion  Corporation,  Maryland)  were 
examined  in  terms  of  inter-program  and  inter-analyst  consistency  in  the  measurement  of 
glottal  gap  area  (GGA),  and  total  vocal  fold  length  (TVFL).  The  inter-program 
consistency  refers  to  the  comparison  of  the  three  software  packages  mentioned  above  in 
terms  of  the  consistency  of  the  measurements  produced.  Inter-analyst  consistency  refers 
to  the  consistency  of  measurements  produced  by  each  analyst.  A collection  of  endoscopic 
image  frames  of  a subject  with  normal  voice  production  and  a patient  with  unilateral 
adductor  vocal  fold  paralysis  during  voicing  were  used  as  the  samples  for  analysis. 

Software  Packages 

A general  description  of  the  features  of  the  software  packages  used  in  this  study  is 
crucial  in  pointing  out  the  differences  among  them  and  the  degree  of  accuracy  each 
provides  to  image  analysis  and  processing  of  the  laryngeal  structures. 

Adobe  Photoshop 

Adobe  Photoshop  5.5  is  a software  package  that  was  originally  designed  for 
creating  and  editing  web  graphics.  It  has  the  capability  to  read  and  write  a large  number 


41 


42 


of  image  files.  Examples  are  GIF,  TIFF,  BMP,  JPEG,  RAW,  and  PDF.  It  also  supports  a 
number  of  standard  image  processing  functions,  such  as  contrast  enhancement,  hue  / 
saturation  (perception  of  change  in  color  because  of  changes  in  the  wavelength  of  the 
perceived  light),  (Price  1997),  equalization,  adding  and  reducing  noise,  border  detection, 
and  thresholding.  It  can  be  used  to  measure  area,  mean,  perimeter,  length,  etc.  of  user 
defined  regions  of  interest.  Measurement  results  can  be  printed,  exported  to  text  files,  or 
copied  to  the  Clipboard  {Adobe  Photoshop  5.5  User  Guide  Supplement  1999). 

Gimp 

Gimp  version  0.99.x/ 1.0  is  a freely  distributed  image  manipulation  program 
suitable  for  image  painting,  composition  and  processing  under  XI 1 on  Unix  platforms. 
Gimp  was  originally  written  by  two  students,  Peter  Mattis  and  Spencer  Kimball, 
attending  the  University  of  California  at  Berkley.  Tor  Lillqvist,  a graduate  of  Helsinki 
University  of  Technology,  Finland,  developed  a new  Windows  version  and  released  it  on 
05-08-2000.  This  version  is  available  for  free  download  (See  download  page  at 
http://www.gimp.org  /~tml/gimp/win32//downloads.htmn.  It  can  be  used  as  a simple 
paint  program,  photo  manipulation  and  a sophisticated  image  processing  tool.  Gimp  can 
be  expanded  with  plugins  and  extensions  tailored  to  the  users  need.  Features  and 
capabilities  include  painting  tools,  such  as  brush,  pencil,  airbrush,  clone,  etc.  Gimp  also 
has  the  capability  to  open  a multiple  number  of  images  at  the  same  time.  Images  can  be 
flipped,  rotated,  scaled,  sheared,  saved,  loaded,  converted  and  displayed.  Image  file 
format  supported  are  GIF,  JPEG,  XMP,  TIFF,  TGA,  MPEG,  PS,  PDF,  PCX,  and  BMP. 
Selection  tools  include  rectangle,  ellipse,  free,  fuzzy,  bezier  and  intelligent  plugins,  which 
allow  for  easy  addition  of  new  file  formats  and  new  filters.  Gimp  Supports  8,  15,  16,  and 


43 


24  bit  displays  with  RGB,  Grayscale  and  Indexed  color  modes.  It  also  can  create  multiple 
views  of  the  same  images  to  facilitate  complex  tasks.  It  also  has  crop,  color  picker, 
bucket-fill  and  text  tools.  Image  processing  filters  include  blur,  edge  detect,  pexelize,  etc. 
(G/mp  User  Manual  1999). 

Scion  Image 

Scion  Image  Beta  4.2  is  the  IBM  PC  version  of  the  NIH  Image  processing  and 
analysis  program  that  is  based  on  the  Macintosh  platform.  Therefore,  Scion  Image 
features  are  described  below  in  some  detail. 

Scion  Image  can  acquire,  display,  edit,  enhanee,  and  analyze  endoscopic  images. 
The  software  can  be  freely  downloaded  from  Scion  Corporation  Website  at  http://www.se 
ioncorp.com/frames/fr  scion  products.htm.  This  program  reads  and  writes  TIFF  and 
BMP  files  only,  which  places  some  limitations  on  the  user.  Although  TIFF  and  BMP  files 
preserve  image  details,  they  require  a larger  amount  of  computer  memory  than,  for 
example,  the  graphics  interchange  format  (GIF)  files,  which  can  be  read  by  both  Adobe 
Photoshop  and  Gimp,  and  preserve  image  details  as  well  as  require  less  computer 
memory  (CompuServe  Inc.  1987,  http://www.tnt.uni-annover.de/soft/imgproc/fileformats 
roc/fileformats/gif87.txt).  Scion  Image  supports  many  standard  image  processing 
functions,  such  as  contrast  enhancement,  smoothing,  sharpening,  and  edge  detection.  It 
can  be  used  to  measure  area,  mean,  perimeter,  etc.  of  user-defined  regions  of  interest. 
Measurement  results  can  be  printed,  exported  to  text  files,  or  copied  to  the  Clipboard. 
Scion  Image  manipulates,  displays  and  analyzes  two-dimensional  images.  Image  pixels 
can  be  represented  by  8-bit  integers,  ranging  in  value  from  0 to  255.  Scion  Image  displays 
zero  pixels  as  white  and  those  with  a value  of  255  as  black.  Images,  measurement  results. 


44 


and  other  functions  are  displayed  in  windows  that  can  be  dragged  and  resized.  The  look- 
up table  (LUT)  window  displays  the  current  video  look-up  table.  The  Tools  window 
contains  tools  for  making  selections,  editing  images,  drawing  text,  and  making 
measurements.  The  Map  window  is  used  for  adjusting  the  contrast  and  brightness  of 
images  and  for  enabling  and  disabling  thresholding.  The  Info  window  displays  status 
information,  such  as  cursor  position  and  value,  and  the  most  recent  measurement  results. 
The  Results  window  displays  the  current  table  of  measurement  results.  The  Plot  window 
displays  density  profile  and  calibration  plots.  Images  can  be  stored  in  the  PC’s  internal 
memory,  Random  Access  Memory  or  RAM.  The  Import  and  Export  commands  allow 
image  information  (e.g.,  measurements)  to  be  transferred  to  text  (spreadsheet)  formats  to 
be  read  and  written  and  are  compatible  with  spreadsheet  and  statistical  analysis  programs. 

Using  Scion  Image  tools,  a user  can  edit  color  and  gray  scale  images,  draw  lines, 
rectangles  and  text.  The  images  can  be  flipped,  rotated,  and  measuring  scale  determined. 
A user  can  also  use  multiple  windows  and  eight  levels  of  magnification.  All  editing, 
filtering,  and  measurement  functions  operate  at  any  level  of  magnification  and  are 
undoable. 

Image  Enhancement  operations,  via  the  Process  menu,  provide  filters  for 
smoothing,  sharpening,  finding  edges  and  reducing  noise  in  images.  Other  functions 
include  converting  grayscale  images  to  images  consisting  of  only  black  and  white  pixels 
and  commands  to  process  such  images.  Arithmetic  sub-menus  can  be  used  to  add  or 
multiply  an  image  by  a constant. 

Manual  area  measurement  is  made  by  outlining  a region  of  interest  using  the 
rectangular,  oval,  polygonal,  or  freehand  selection  tool.  The  second  step  is  to  select  the 


45 


Measure  command,  which  will  compute  the  area,  mean  gray  value,  and  the  minimum  and 
maximum  gray  value.  Distances  can  be  made  by  making  a straight,  freehand  or 
segmented  line  selection,  and  then  using  the  measure  command.  The  wand  tool 
automatically  outlines  structures  isolated  using  thresholding  (without  taking  into  account 
the  gradient  of  the  vocal  fold  boundaries).  Results  from  the  most  recent  measurement  are 
displayed  in  the  Info  window.  The  Show  Results  command  displays  a table  of  results 
since  the  last  time  the  Reset  command  was  used.  The  Analyze  Particles  command 
automatically  counts  and  measures  features  of  interest.  This  requires  thresholding  to 
discriminate  objects  of  interest  from  surrounding  background  based  on  their  gray  values. 
Scion  Image  has  two  thresholding  methods.  In  thresholding  mode  (Options  menu),  all 
pixels  equal  to  or  greater  than  a single  threshold  level  are  displayed  in  black,  and  all  other 
pixels  (the  background)  are  displayed  in  white.  In  Density  Slicing  mode  (Options  menu), 
all  pixels  between  lower  and  upper  thresholds  (T1  and  T2)  are  highlighted  in  red.  For 
both  modes,  the  user  can  adjust  T1  and  T2  by  dragging  the  look-up  table  (LUT)  tool  in 
the  LUT  window.  For  successful  thresholding,  it  may  be  necessary  to  use  the  Subtract 
Background  command  to  remove  the  effects  of  uneven  illumination.  The  Open  command 
is  used  to  load  and  display  images  (in  TIFF  and  BMP  format),  text  files,  color  look-up 
tables  or  region  of  interest  outlines.  Scion  Image  can  read  and  manipulate  TIFF  files 
created  by  other  programs.  {Scion  Image  Beta  4.2  for  Windows  Manual  1999). 

Subjects 

This  study  did  not  involve  any  direct  contact  with  human  subjects.  Rather,  this 
study  examined  a pre-existing  database  of  endoscopic  images  obtained  from  the 
Department  of  Otolaryngology  at  the  University  of  Florida.  Patients  that  are  seen  in  the 


46 


Department  of  Otolaryngology  are  subjected  to  a voice  evaluation  and  endoscopic  and 
videostroboscopic  examination  of  their  larynx.  The  Institutional  Review  Board  (IRB)  for 
the  University  of  Florida  approved  the  project  on  8-3-2000  (IRB  no.  106-2000).  At  the 
time  these  images  were  obtained,  the  normal  subject  did  not  have  any  abnormal 
articulation,  resonance,  or  language  ability,  abnormal  hearing,  allergies  or  colds  on  the 
day  of  testing,  or  smoking  history.  The  patient  with  unilateral  adductor  vocal  fold 
paralysis  did  not  have  any  symptoms  of  superior  laryngeal  nerve  dysfunction  (i.e.  tensing 
and  lengthening  of  the  vocal  fold  was  present),  or  an  idiopathic  etiology  of  unilateral 
vocal  fold  paralysis.  Those  observations  were  reported  by  the  resident  otolaryngologist 
accompanied  with  the  resident  speech-language  pathologist  at  the  time  of  vocal 
evaluation. 

Three  endoscopic  image  frames  of  the  normal  subject's  larynx  and  three 
endoscopic  image  frames  of  the  patient  with  unilateral  recurrent  laryngeal  nerve  paralysis 
(URLNP)  were  used  for  image  processing  and  analysis  in  this  study.  For  the  patient,  only 
the  pre  thyroplasty  image  frames  were  used  in  this  study.  For  both  subjects,  only  the 
images  that  permitted  the  visualization  of  the  tip  of  the  anterior  commissure  and  the  tip  of 
the  vocal  process  in  the  same  image  were  included  in  the  study.  One  condition  was 
analyzed;  maximum  opening  during  voicing.  This  condition  was  used  because  it  imposes 
greater  difficulty  upon  GGA  and  TVFL  measurement  than  any  other  condition  (such  as 
breathing  or  maximum  closure  during  voicing).  Obtaining  endoscopic  images  of  the 
vocal  folds  while  the  subject  is  breathing  could  reduce  measurement  error  because  the 
vocal  folds  are  less  affected  by  other  factors  such  as  vocal  fold  stretching  during 
phonation.  However,  during  phonation,  the  paralyzed  vocal  fold  increases  in  length 


47 


influenced  by  the  stretching  force  of  the  normal  vocal  fold  (Inagi  et  al.  1997).  This  effect 
is  a result  of  the  functional  unity  between  the  two  vocal  folds. 

Each  image  frame  of  the  subjects'  larynx  was  measured  ten  times  by  three 
analysts,  two  expert  and  one  naive.  Both  expert  analysts  were  speech-language 
pathologists  with  a minimum  of  five  years  of  experience  working  with  the  normal  and 
disordered  human  voice.  The  naive  analyst  was  an  undergraduate  student  of 
Communication  Sciences  and  Disorders  with  no  experience  working  with  the  human 
voice.  All  analysts  were  trained  on  measurement  procedures.  Prior  to  the  analysis, 
measurement  practice  sessions  for  all  analysts  revealed  that  intra-analyst  error  did  not 
exceed  5%  across  10  trials.  Other  environmental  variables  that  were  controlled  during 
data  measurement  included:  user  to  screen  distance,  user's  head  movement,  measurement 
time,  user  fatigue,  and  screen  resolution.  User  to  screen  distance  was  set  at  25  inches, 
securing  a comfortable  posture  for  the  analyst  and  the  ability  to  detect  image  details. 
Controlling  user  to  screen  distance  and  user's  head  movement  was  achieved  using  a 
modified  motorcycle  helmet  fixed  on  a board  at  25-inch  distance.  All  measurements  took 
place  at  the  same  setting  and  nearly  at  the  same  time  (12:00-200  p.m.)  every  day  for  four 
weeks.  The  analysts  started  measurement  when  they  were  physically  comfortable  and 
discontinued  when  signs  of  fatigue  were  noticed.  Screen  resolution  was  set  at  1024  x 768 
True  Color  (32  bit).  This  high  resolution  provided  better  visibility  to  the  analysts  as  well 
as  complied  with  the  resolution  required  by  the  three  imaging  software  packages. 

Equipment  and  Procedures 

The  laryngeal  images  were  digitized  with  Adobe  Premier  5.0  (Adobe  Systems 
Inc.,  CA),  and  saved  on  a disk  as  BMP  files.  The  capturing  procedure  was  done  in  the 


48 


following  manner;  endoscopic  images  were  played  by  a Super  VHS  SONY  Video 
Recorder  and  displayed  on  a Sony  television.  Adobe  Premier  File  menu  was  opened  and 
the  Capture  submenu  was  selected.  In  that  submenu  Movie  Capture  was  selected.  The 
option  Record  was  then  clicked.  Images  were  digitized  using  a digitizing  board  (Bravado 
DV  2000,  Pinnacle  Systems  Inc.,  CA)  installed  on  a PC  with  an  Accelerator  III  3D 
graphics  card.  After  recording  the  endoscopic  images  of  interest,  they  were  displayed 
frame  by  frame.  Individual  frames  were  examined  to  ensure  that  all  of  the  images 
selected  showed  the  entire  anteroposterior  length  of  the  vocal  folds.  To  begin  image 
processing  and  analysis,  the  endoscopic  images  were  retrieved  with  Adobe  Photoshop 
5.5,  Gimp  version  0.99.x/1.0  and  Scion  Image  4.2.  Images  were  retrieved  using  an  IBM 
compatible  PC,  800  MFIZ  Pentium  III  system.  The  size  of  the  viewing  screen  (computer 
monitor)  was  2 1 inches  (diagonal  picture  area)  to  permit  the  analyst  to  view  the  image  at 
400%  magnification  as  discussed  below. 

Adobe  Photoshop 

The  images  were  retrieved  by  selecting  the  Open  submenu  in  the  File  menu.  Images  were 
opened  as  Bitmap  RGB  (color)  images.  Image  size  was  set  at  320  X 240  pixels  and  was 
verified  by  selecting  Image  Size  in  the  image  menu.  To  ensure  that  the  units  of 
measurement  were  pixels,  the  File  menu  was  selected  and  the  Preference  submenu  lead  to 
the  Units  and  Rulers  option.  The  color  data  was  discarded  by  selecting  the  Mode 
submenu  in  the  Image  menu  then  selecting  display  by  Grayscale.  Using  the  Navigator 
window,  the  image  was  magnified  to  400%  (or  4: 1)  of  its  original  size.  The  purpose  of 
this  process  was  to  reveal  the  gradient  nature  of  the  vocal  fold  edges,  helping  the  analyst 
to  improve  manual  tracing  of  both  the  GGA  and  the  TVFL.  The  GGA  was  computed  as 


49 


follows:  the  Lasso  tool  was  selected  from  the  Tool  pallet  and  the  glottal  gap  was 
manually  traced.  The  first  medial  pixel  was  eonsidered  a point  of  internal  reference.  The 
traced  area  histogram  was  displayed  by  selecting  it  from  the  Image  menu.  The  histogram 
gray  levels  were  highlighted  and  the  GGA  was  obtained.  The  histogram  was  then  closed 
by  clicking  the  Ok  tab.  The  same  image,  with  the  same  size  and  magnification,  was  used 
to  obtain  TVFL.  The  Measure  tool  was  selected  from  the  Tool  pallet  and  a straight  line 
was  extended  from  the  point  of  the  anterior  commissure  to  the  point  of  the  tip  of  the 
vocal  process.  The  distanee  between  those  two  points  was  obtained.  The  units  of 
measurement  for  both  the  TVFL  and  GGA  were  pixel.  Figure  2-1  is  a general 
representation  of  the  traced  GGA  and  TVFL.  The  last  step  was  to  normalize  the  GGA  as 
follows  (Omori,  et  al.  1996,  1998).  Normalized  Glottal  Gap  Area  (NGGA)  = GGA  / 
(TVFL)  ^ X 100.  The  GGA  was  normalized  by  the  TVFL  to  eontrol  ehanges  in  the  GGA 
due  the  vertical  movements  of  the  endoscope  during  endoseopy.  These  vertical 
movements  ereate  axial  distortion,  whereby  objects  proximal  to  the  endoseope  appear 


(a)  (b) 

Figure  2-1.  Tracing  vocal  fold  edges  (a)  traced  glottal  gap  area  (GGA).  (b)  traced 
total  vocal  fold  length  (TVFL). 


50 


larger  and  more  distorted  than  distant  objects.  The  TVFL  was  squared  in  normalizing  the 
GGA  because  GGA  is  a two-dimensional  measure.  It  also  has  a non-uniform  shape  that 
makes  its  measurement  somewhat  difficult. 

Gimp 

The  images  were  retrieved  by  opening  the  Open  submenu  in  the  File  menu. 
Images  were  opened  as  Bitmap  RGB  (color)  images.  Image  size  was  set  at  320  X 240 
pixels.  The  size  was  verified  by  selecting  Image  Size  in  the  Image  menu.  The  color  data 
was  discarded  by  selecting  the  Mode  submenu  in  the  Image  menu  then  selecting  display 
by  Grayscale.  Using  the  Navigator  window,  the  image  was  magnified  to  400%  (or  4: 1)  of 
its  original  size.  This  was  done  by  opening  the  View  menu  and  selecting  the  Zoom 
submenu.  GGA  area  was  computed  as  follows:  the  Lasso  tool  was  selected  from  the  Tool 
pallet  and  the  glottal  gap  was  manually  traced.  The  first  medial  pixel  was  considered  a 
point  of  internal  reference.  The  traced  area  was  copied  to  a new  blank  image  of  the  same 
size  and  magnification.  The  histogram  generated  by  Gimp  displays  the  area  of  the  entire 
displayed  image  regardless  of  the  traced  area,  thus  extracting  the  traced  area  and  placing 
it  in  a new  image  ensured  correct  measurement  of  GGA.  The  traced  area  histogram  was 
then  displayed  by  selecting  it  from  the  Image  menu.  The  histogram  gray  levels  were 
highlighted  and  the  GGA  was  obtained.  The  histogram  was  closed  by  clicking  the  Close 
tab.  The  original  image,  with  the  same  size  and  magnification,  was  used  to  obtain  TVFL. 
The  Measure-Distances-and- Angles  tool  was  selected  from  the  Tool  window  and  a 
straight  line  was  extended  from  the  point  of  the  anterior  commissure  to  the  point  of  the 
tip  of  the  vocal  process.  The  distance  between  those  two  points  was  obtained.  The  GGA 
was  normalized  using  the  normalization  equation.  The  units  of  measurement  were  pixels. 


51 


Scion  Image 

The  images  were  retrieved  by  opening  the  Open  submenu  in  the  File  menu. 
Images  were  opened  as  a BITMAP  grayscale  image,  because  Scion  Image  lacks  the 
capability  to  open  RGB  images,  unless  they  are  opened  as  a stack.  Image  size  was  set  at 
320  X 240  pixels.  The  size  was  verified  by  selecting  Get  Info  in  the  File  menu.  Using  the 
Magnifying  Glass  tool  in  the  Tool  pallet,  the  image  was  magnified  to  400%  (or  4:1)  of  its 
original  size.  GGA  area  was  computed  as  follows:  the  Free  Hand  tool  was  selected  from 
the  Tool  pallet  and  the  glottal  gap  was  manually  traced.  The  first  medial  pixel  (the  first 
medial  pixel  of  the  3-6  pixels  comprising  the  vocal  fold  boundary)  that  could  be  located 
medially  when  was  considered  a point  of  internal  reference.  The  Measure  option  was 
selected  from  the  Analyze  menu,  and  then  the  option  Show  Results  was  selected.  The 
same  image,  with  the  same  size  and  magnification,  was  used  to  obtain  TVFL.  The 
Straight  Line  tool  was  selected  from  the  Tool  pallet  and  a straight  line  was  extended  from 
the  point  of  the  anterior  commissure  to  the  point  of  the  tip  of  the  vocal  process.  Again  the 
Measure  option  was  selected  from  the  Analyze  menu,  then  the  option  Show  Results  was 
selected.  The  glottal  gap  area  was  normalized  using  the  equation  mentioned  above.  The 
units  of  measurement  were  pixel. 


Statistical  Analysis 

The  design  of  this  study  was  a3x3x3x2x2xl0  (i.e.  3 programs  (software 
packages)  x 3 analysts  x 3 image  frames  x 2 conditions  (normal  vs.  pathological  larynx)  x 
2 dependent  variables  (NGGA,  TVFL)  x 10  trials,  which  represented  a General  Linear 
Model  (GLM)  for  Mixed  Design  Repeated  Measures  ANOVA.  The  advantage  of  this 
design  lies  in  its  ability  to  take  subject-to-subject  differences  (which  are  usually  large) 


52 


out  of  the  error  term  of  the  statistical  general  model.  The  result  is  a considerable  increase 
in  statistical  power.  With  repeated  measures,  the  within-cell  variability  does  not  exist 
because  there  is  only  one  observation  per  cell  (i.e.  the  same  dependent  variable  is 
measured  across  image  frames).  The  data  set  satisfied  the  assumptions  of  the  Mixed 
Repeated  Measures  Model,  which  are  normality,  independence  and  sphericity.  Because 
the  data  had  equal  N,  the  normality  assumption  was  met.  The  independence  assumption 
was  also  met,  because  there  were  no  dependencies  among  scores  (each  score  was 
achieved  by  the  three  analysts  independently).  The  sphericity  assumption  (equal  variance 
and  covariance)  was  met  by  running  Mauchly's  Test  of  Sphericity  to  test  the  sphericity  of 
data  for  both  dependent  variables.  The  sphericity  assumption  was  confirmed  for  both 
dependent  variables.  Meeting  the  sphericity  assumption  was  important  because  the 
repeated  measures  model  assumes  that  the  dependent  variables  have  equal  variance  and 
are  correlated.  Violating  this  assumption  would  inflate  the  F ratio  (i.e.  underestimate  the 
error  term  in  the  model).  The  result  would  be  an  increased  Type  I error  and  a corrected  F 
ratio  should  be  reported  (Shavelson  1981). 

The  probability  level  was  initially  set  at  0.05  {p  < 0.05).  Two  separate  analyses 
were  made,  one  for  the  each  dependent  variable  (NGGA  and  TVFL).  Therefore,  the 
Bonferroni  procedure  was  used  to  correct  for  Type  I error  (the  probability  level  = 0.05  / 2 
= 0.025),  where  the  denominator  indicates  the  number  of  tests  run  on  the  same  set  of  data 
points).  The  between-  subject  factor  was  the  condition  (normal  vs.  pathological  larynx). 
The  within-subject  factors  were  analysts,  programs,  and  trials  (repetitions).  All  within- 
subject  and  between-subject  factors  were  treated  as  fixed  factors,  because  the  focus  of 
this  study  was  to  test  those  specific  factors  using  a purposeful  sample  and  not  to 


53 

generalize  the  results  across  the  population.  The  variance  component  of  the  hypotheses 
was  examined  through  comparing  standard  deviation.  The  purpose  was  to  reach  a 
judgment  regarding  the  variability  produced  by  each  of  the  programs  and  the  analysts  in 
the  study.  The  data  were  analyzed  using  SPSS  version  10.0  for  windows  (SPSS  Inc.,  111). 


CHAPTER  3 
RESULTS 


Descriptive  Statistics 

Means  and  standard  deviations  were  calculated  to  examine  data  trends.  Tables  3-1 
through  3-6  show  the  means  and  the  standard  deviations  for  the  three  programs,  {Adobe 
Photoshop,  Gimp  and  Scion  Image),  three  analysts  and  three  frames  measured  in  the  two 
conditions  (normal  and  pathological)  for  both  dependent  variables.  The  data  showed  that 
mean  NGGA  measurements  obtained  by  Analyst  3 (experienced)  using  Adobe  Photoshop 
tended  to  be  higher  than  results  obtained  by  Analysts  1 and  2 across  conditions  and 
across  frames,  (Table  3-1  and  Table  3-2).  Analyst  3 also  obtained  higher  NGGA  standard 
deviations  than  those  obtained  by  Analysts  1 and  2,  except  for  the  standard  deviation  of 
Normal  Frame  3,  where  results  obtained  by  Analyst  3 showed  a higher  standard  deviation 
than  Analyst  1 but  a slightly  lower  standard  deviation  than  Analyst  2.  For  TVFL  (Table 
3-2)  Analyst  3 results  had  lower  means  than  Analysts  1 and  2,  except  for  Normal  Frame 
2,  where  higher  means  were  obtained.  Another  exception  was  for  Normal  Frame  3,  where 
results  obtained  by  Analyst  3 showed  lower  means  than  Analyst  1 but  higher  means  than 
Analyst  2.  Standard  deviations  were  generally  lower  except  for  Normal  Frame  1,  where 
the  standard  deviation  was  markedly  higher  than  those  of  Analysts  1 and  2.  Also,  Analyst 
3 obtained  a lower  standard  deviation  than  Analyst  1,  but  higher  than  Analyst  2 for 
Pathological  Frame  2.  Analyst  2 obtained  markedly  higher  standard  deviations  in  Normal 
Frame  2 and  Pathological  Frame  3. 


54 


55 


Table  3-1 . Means  and  standard  deviations  (SD)  for  the  NGGA  measurements  across 
analysts  using  Adobe  Photoshop. 


Frame  1 


Analyst  1 

Analyst  2 

Analyst  3 

Normal 

Mean 

7.21 

7.89 

9.57 

SD 

0.19 

0.39 

1.31 

Pathological 

Mean 

8.90 

8.20 

10.40 

SD 

0.21 

0.30 

0.49 

Frame  2 

Analyst  1 

Analyst  2 

Analyst  3 

Normal 

Mean 

7.32 

7.86 

8.66 

SD 

0.29 

0.40 

0.61 

Pathological 

Mean 

9.00 

8.68 

10.49 

SD 

0.48 

0.26 

0.56 

Frame  3 


Analyst  1 

Analyst  2 

Analyst  3 

Normal 

Mean 

6.87 

7.26 

8.33 

SD 

0.21 

0.38 

0.37 

Pathological 

Mean 

8.98 

9.01 

10.25 

SD 

0.25 

0.58 

0.68 

Table  3-3  shows  the  means  and  standard  deviations  for  NGGA  across  analysts 
using  Gimp.  The  same  trend  was  observed  as  compared  to  Adobe  Photoshop  and  Scion 
Image,  where  results  obtained  by  Analyst  3 showed  higher  NGGA  means  across  frames 
than  Analysts  1 and  2.  Standard  deviations  were  generally  lower  with  the  exception  of 
Normal  Frame  1,  where  the  standard  deviation  obtained  by  Analyst  3 was  higher  than 
that  obtained  by  Analyst  1,  but  lower  than  that  obtained  by  Analyst  2.  Another  exception 
was  Pathological  Frame  3,  where  the  standard  deviation  obtained  by  Analyst  3 was  lower 
than  that  of  Analyst  1 but  higher  than  that  of  Analyst  2.  TVFL  measurements  (Table  3-4) 
showed  lower  means  obtained  by  Analyst  3 than  those  obtained  by  Analyst  1 and  2 


56 


Table  3-2.  Means  and  standard  deviations  for  the  TVFL  measurements  across  analysts 
using  Adobe  Photoshop. 


Frame  1 

Analyst  1 

Analyst  2 

Analyst  3 

Normal 

Mean 

88.84 

85.21 

83.80 

SD 

0.69 

0.78 

4.06 

Pathological 

Mean 

121.21 

122.09 

117.35 

SD 

1.06 

0.78 

0.49 

Frame  2 


Analyst  1 

Analyst  2 

Analyst  3 

Normal 

Mean 

89.44 

86.76 

89.55 

SD 

1.31 

3.44 

0.77 

Pathological 

Mean 

116.03 

115.76 

112.58 

SD 

1.39 

0.83 

1.06 

Frame  3 

Analyst  1 

Analyst  2 

Analyst  3 

Normal 

Mean 

88.93 

85.96 

86.40 

SD 

1.32 

1.01 

0.55 

Pathological 

Mean 

119.73 

118.93 

117.11 

SD 

1.13 

3.55 

0.68 

across  normal  and  pathological  frames,  except  for  Normal  Frames  1,  2 and  3.  The  mean 
obtained  by  Analyst  3 for  Normal  Frame  1 was  higher  than  those  of  Analyst  2 and 
Analyst  1 . The  mean  obtained  by  Analyst  3 for  Normal  Frame  2 was  lower  than  that  of 
Analyst  1 but  higher  than  that  of  Analyst  2.  Standard  deviation  obtained  by  Analyst  3 
were  lower  that  those  obtained  by  Analysts  1 and  2,  except  for  Pathological  Frame  1, 
where  the  standard  deviation  obtained  by  Analyst  3 was  lower  than  that  of  Analyst  1 but 
higher  than  that  of  Analyst  2. 

Results  obtained  using  Scion  Image  showed  similar  trends  as  compared  to  Adobe 
Photoshop  and  Gimp.  The  NGGA  means  obtained  by  Analyst  3 were  higher  than  those 


57 


Table  3-3.  Means  and  standard  deviations  for  the  NGGA  measurements  across  analysts 
using  Gimp. 


Frame  1 


Analyst  1 

Analyst  2 

Analyst  3 

Normal 

Mean 

11.32 

11.10 

11.62 

SD 

0.45 

0.62 

0.54 

Pathological 

Mean 

12.75 

12.62 

13.97 

SD 

0.62 

0.39 

0.34 

Frame  2 

Analyst  1 

Analyst  2 

Analyst  3 

Normal 

Mean 

11.22 

10.98 

12.75 

SD 

0.52 

0.44 

0.42 

Pathological 

Mean 

13.44 

12.85 

15.51 

SD 

0.58 

0.68 

0.15 

Frame  3 

Analyst  1 

Analyst  2 

Analyst  3 

Normal 

Mean 

11.30 

10.99 

12.33 

SD 

0.50 

0.44 

0.42 

Pathological 

Mean 

13.00 

12.64 

14.15 

SD 

0.53 

0.36 

0.38 

obtained  by  Analysts  1 and  2 for  both  conditions.  Standard  deviations  were  also  higher 
except  for  Normal  Frame  1 and  Normal  Frame  3.  The  standard  deviation  obtained  by 
Analyst  3 for  Normal  Frame  1 was  lower  than  that  obtained  by  Analyst  1 and  higher  than 
that  obtained  by  Analyst  2.  The  standard  deviation  obtained  by  Analyst  3 for  Normal 
Frame  3 was  lower  than  that  obtained  by  both  Analyst  1 and  Analyst  2.  Pathological 
Frame  2 was  also  an  exception,  where  the  standard  deviation  obtained  by  Analyst  3 were 
lower  than  that  obtained  by  Analyst  1 and  higher  than  that  obtained  by  Analyst  2,  (Table 
3-5).  As  for  TVFL  results  (Table  3-6),  means  obtained  by  Analyst  3 where  lower  across 
both  conditions  than  those  obtained  by  Analysts  1 and  2 across  all  frames. 


58 


Table  3-4.  Means  and  standard  deviations  for  the  TVFL  measurements  across  analysts 
using  Gimp. 


Frame  1 

Analyst  1 

Analyst  2 

Analyst  3 

Normal 

Mean 

88.67 

85.02 

89.54 

SD 

1.22 

1.42 

0.88 

Pathological 

Mean 

121.22 

121.03 

118.09 

SD 

1.06 

0.56 

0.77 

Frame  2 

Analyst  1 

Analyst  2 

Analyst  3 

Normal 

Mean 

89.25 

85.02 

87.75 

SD 

1.80 

1.21 

0.56 

Pathological 

Mean 

115.58 

115.76 

112.88 

SD 

1.18 

0.87 

0.57 

Frame  3 

Analyst  1 

Analyst  2 

Analyst  3 

Normal 

Mean 

88.26 

85.39 

85.62 

SD 

0.82 

0.89 

0.43 

Pathological 

Mean 

121.06 

121.77 

117.43 

SD 

1.20 

0.75 

0.05 

Standard  deviations  obtained  by  Analyst  3 were  generally  lower  across  both  conditions. 
The  standard  deviation  obtained  by  Analyst  3 for  Normal  Frame  2 was  lower  than  that 
obtained  by  Analyst  1 but  higher  than  that  obtained  by  Analyst  3.  Also,  the  standard 
deviations  obtained  by  Analyst  3 for  Pathological  Frame  1 and  Pathological  Frame  3 
were  lower  than  those  obtained  by  Analyst  1 but  higher  than  that  obtained  by  Analyst  2. 
In  general,  Analyst  3 obtained  higher  NGGA  means  but  lower  TVFL  means  across 
programs,  conditions  and  frames  than  Analysts  1 and  2.  Also,  NGGA  standard  deviations 
obtained  by  Analyst  3 were  generally  higher  than  those  obtained  by  Analysts  1 and  2 
across  programs,  conditions  and  frames.  However,  TVFL  standard  deviations  obtained  by 


59 


Table  3-5.  Means  and  standard  deviations  for  the  NGGA  measurements  across  analysts 
using  Scion  Image. 


Frame  1 


Analyst  1 

Analyst  2 

Analyst  3 

Normal 

Mean 

6.65 

6.65 

8.74 

SD 

0.48 

0.30 

0.42 

Pathological 

Mean 

8.23 

8.09 

10.42 

SD 

0.25 

0.20 

0.42 

Frame  2 

Analyst  1 

Analyst  2 

Analyst  3 

Normal 

Mean 

6.59 

6.73 

9.00 

SD 

0.20 

0.25 

0.58 

Pathological 

Mean 

8.53 

8.40 

11.02 

SD 

0.36 

0.29 

0.30 

Frame  3 


Analyst  1 

Analyst  2 

Analyst  3 

Normal 

Mean 

6.19 

6.49 

7.45 

SD 

0.42 

0.48 

0.32 

Pathological 

Mean 

8.39 

8.27 

11.36 

SD 

0.25 

0.22 

0.27 

Analyst  3 were  generally  lower  than  those  obtained  by  Analysts  1 and  2 across  programs, 
conditions  and  frames. 

Table  3-7  shows  the  means  and  the  standard  deviations  for  the  three  programs 
across  the  two  conditions  for  the  two  dependent  variables.  The  data  shows  that  Scion 
Image  produced  lower  means  and  standard  deviations  than  Adobe  Photoshop  and  Gimp 
across  the  two  conditions  for  the  two  dependent  variables.  The  only  exception  was  the 
standard  deviation  for  TVFL  for  the  normal  condition,  where  Scion  Image  produced 
lower  standard  deviation  than  Adobe  Photoshop  but  higher  standard  deviation  than  Gimp. 


60 


Table  3-6.  Means  and  standard  deviations  for  the  TVFL  measurements  across  analysts 
using  Scion  Image. 


Frame  1 

Analyst  1 

Analyst  2 

Analyst  3 

Normal 

Mean 

91.77 

90.18 

85.40 

SD 

1.33 

1.69 

1.19 

Pathological 

Mean 

121.90 

121.33 

115.13 

SD 

1.16 

0.78 

1.06 

Frame  2 


Analyst  1 

Analyst  2 

Analyst  3 

Normal 

Mean 

94.37 

92.11 

86.23 

SD 

1.84 

0.67 

1.18 

Pathological 

Mean 

117.93 

117.37 

110.86 

SD 

2.14 

1.36 

0.56 

Frame  3 


Analyst  1 

Analyst  2 

Analyst  3 

Normal 

Mean 

91.86 

90.84 

87.27 

SD 

2.39 

0.78 

0.07 

Pathological 

Mean 

122.15 

121.55 

114.01 

SD 

1.06 

0.52 

1.02 

Inferential  Statistics 


Normalized  Glottal  Gap  Area  rNGGAI 


Between-Subiect  Factors 

The  ANOVA  detected  a significant  difference  in  the  main  effect  of  condition,  F 
(9412.733),  P = 0.000. 


Within-Subiect  Factors 

Table  3-8  provides  the  relevant  ANOVA  statistical  results  for  the  within-subject 


factors.  There  was  no  significant  difference  in  the  main  factor  of  trial,  F (1.780),  P 


61 


Table  3-7.  Means  and  standard  deviations  for  the  NGGA  and  TVFL  measurements 
across  programs. 


NGGA 


Adobe  Photoshop 

Gimp 

Scion  Image 

Normal 

Mean 

7.88 

11.51 

7.16 

SD 

0.58 

0.54 

0.54 

Pathological 

Mean 

9.32 

13.43 

9.17 

SD 

0.44 

0.63 

0.39 

TVFL 

Adobe  Photoshop 

Gimp 

Scion  Image 

Normal 

Mean 

87.21 

87.17 

84.91 

SD 

2.20 

1.43 

1.90 

Pathological 

Mean 

117.87 

118.31 

115.55 

SD 

2.73 

2.71 

2.22 

0. 107.  However,  a significant  difference  in  the  main  effect  of  program,  F (562.391),  P = 
0.000  was  detected.  Although  the  main  effect  of  analyst  reached  statistical  significance,  F 
(275.424),  P = 0.000,  the  ANOVA  detected  a statistically  significant  two-way  interaction 
between  analyst  and  condition.  Because  of  the  presence  of  the  two-way  interaction, 
evaluating  the  main  factors  (analyst  and  condition)  separately  would  be  misleading  as  the 
main  factors  have  influenced  each  other  in  such  a way  that  separating  their  individual 
effects  should  be  done  statistically  to  evaluate  any  differences.  Figure  3-1  is  a graphical 
representation  of  the  interaction. 


Total  Vocal  Fold  Length  (TVFL) 


Between-Subiect  Factor 

A significant  difference  in  the  main  effect  of  condition,  F (16232.905),  P = 0.000. 


62 


Table  3-8.  Analysis  of  variance  examining  the  effects  of  program,  analyst,  condition 
and  interactions  for  NGGA. 


Source 

Type  III  Sum  of 
Squares 

df 

Mean 

Square 

F P 

Program 

2023.418 

2 

1011.709 

562.391  0.000* 

Program  x condition 

8.456 

2 

4.228 

2.350  0.157 

Error  (program) 

14.392 

8 

1.799 

Analyst 

346.485 

2 

173.242 

275.424  0.000* 

Analyst  x condition 

10.654 

2 

5.327 

8.469  0.011* 

Error  (analyst) 

5.032 

8 

0.629 

Trial 

2.396 

9 

0.266 

1.780  0.107 

Trial  x condition 

0.976 

9 

0.108 

0.725  0.683 

Error  (trial) 

5.385 

36 

0.150 

Program  x analyst 

18.462 

4 

4.615 

3.594  0.028 

Program  x analyst  x condition 

4.270 

4 

1.067 

0.831  0.525 

Error  (program  x analyst) 

20.547 

16 

1.284 

Program  x trial 

5.305 

18 

0.295 

1.481  0.123 

Program  x trial  x condition 

3.080 

18 

0.171 

0.860  0.626 

Error  (program  x trial) 

14.325 

72 

0.199 

Analyst  x trial 

5.874 

18 

0.326 

1.690  0.061 

Analyst  x trial  x condition 

2.461 

18 

0.137 

0.708  0.792 

Error  (analyst  x trial) 

13.900 

72 

0.193 

Program  x analyst  x trial 

5.652 

36 

0.157 

0.740  0.853 

Program  x analyst  x trial  x condition 

5.550 

36 

0.154 

0.727  0.868 

Error  (program  x analyst  x trial 

30.550 

144 

0.212 

* Statistically  significant  at  P < 0.025 


Within-Subiect  Factors 

Although  a significant  difference  was  detected  in  the  main  effects  of  program,  F 
(7.072),  P = 0.017,  and  analyst,  F (607.498),  P = 0.000,  the  ANOVA  detected 
statistically  significant  two-way  interactions.  The  first  interaction  was  between  program 
and  analyst,  F (12.486),  P = 0.000.  Figure  3-2  graphically  depicts  the  means  of  program 
and  analyst  indicating  the  interaction.  The  combination  of  the  two  wi thin-subject  factors 
is  also  needed  for  true  evaluation  of  any  differences. 


63 


Figure  3-1.  A plot  of  analyst  and  condition  means  depicting  the  interaction  for 

Normalized  Glottal  Gap  Area  (NGGA).  The  lines  connecting  the  data  are 
presented  in  all  interaction  figures  in  this  chapter  only  for  visual  aid. 


Figure  3-2.  A plot  of  program  and  analyst  means  depicting  the  interaction  for 
Total  Vocal  Fold  Length  (TVFL). 

Table  3-9  provides  the  ANOVA  table  with  the  relevant  statistics  and  probability 
values  for  each  factor.  The  second  interaction  was  between  analyst  and  condition,  F 


64 


(151.435),  P - 0.000  (See  Figure  3-3).  The  third  interaction  was  between  program  and 
condition,  F (8.007),  P = 0.017.  Figure  3-4  plots  this  interaction  graphically. 


Table  3-9.  analysis  of  variance  examining  the  effects  of  program,  analyst,  condition 
and  interactions  for  TVFL. 


Source 

Type  III  Sum  of 
Squares 

df  Mean  Square 

F P 

Program 

227.501 

2 

113.751 

7.072  0.017* 

Program  x condition 

257.574 

2 

128.787 

8.007  0.012* 

Error  (program) 

128.679 

8 

16.085 

Analyst 

1449.495 

2 

724.747 

607.498  0.000* 

Analyst  x condition 

361.324 

2 

180.662 

151.435  0.000* 

Error  (analyst) 

9.544 

8 

1.193 

Trial 

9.480 

9 

1.053 

0.546  0.831 

Trial  x condition 

12.344 

9 

1.372 

0.711  0.695 

Error  (trial) 

69.448 

36 

1.929 

Program  x analyst 

590.559 

4 

147.640 

12.486  0.000* 

Program  x analyst  x condition 

61.216 

4 

15.304 

1.294  0.314 

Error  (program  x analyst) 

189.185 

16 

11.824 

Program  x trial 

44.961 

18 

2.498 

1.395  0.161 

Program  x trial  x condition 

37.772 

18 

2.098 

1.172  0.307 

Error  (program  x trial) 

128.904 

72 

1.790 

Analyst  x trial 

47.271 

18 

2.626 

1.281  0.227 

Analyst  x trial  x condition 

23.395 

18 

1.300 

0.634  0.861 

Error  (analyst  x trial) 

147.636 

72 

2.050 

Program  x analyst  x trial 

65.140 

36 

1.809 

0.947  0.560 

Program  x analyst  x trial  x 
Condition 

57.580 

36 

1.599 

0.837  0.728 

Error  (program  x analyst  x trial 

275.114 

144 

1.911 

* Statistically  significant  at  P < 0.025 


Post  Hoc  Tests 

Follow  up  post  hoc  tests  were  run  on  the  interaction  terms  reported  above  using 
the  Shaffer-Holm  procedure  (Shaffer  1986,  1991).  In  general,  this  procedure  conducts  a 
Simple  Effect  ANOVA,  and  then  follows  up  with  t-tests.  In  order  to  test  significance,  the 
factor  in  question  should  be  significant  in  both  Simple  Effect  ANOVA  and  the  t-tests. 


65 


Figure  3-3.  A plot  of  analyst  (Al-3)  and  condition  means  depicting  the  interaction  for 
Total  Vocal  Fold  Length  (TVFL). 


This  procedure  is  basically  designed  to  protect  against  Type  I error  inflation.  This  would 
result  in  rejecting  the  null  hypothesis  while  it  is  true.  It  should  be  noted  here  that  the 
probability  level  was  adjusted  {p  < 0.05/3  = 0.0167). 


130 


S 120 


« 110 


ex) 

u 


■o 

E 


u 


100 


90 


80 


^Gimp 


Scion 


, Photoshop 


Normal 


Pathological 


Figure  3-4.  A plot  of  program  and  condition  means  depicting  the  interaction  for  Total 
Vocal  Fold  Length  (TVFL). 


66 


Normalized  Glottal  Gap  Area  (NGGA) 

Analyst  by  Condition  Interaction 

A Simple  Effect  ANOVA  was  run  on  the  interaction  of  analyst  by  condition  for 
the  normal  condition.  The  Bonferroni  procedure  was  used  to  correct  for  Type  I error  (a  = 
0.05  / 2 = 0.025).  The  ANOVA  detected  a significant  difference  interaction  term,  F 
(100.590),  P = 0.000.  Table  3-10  shows  the  statistical  results.  It  should  be  noted  here  that 
condition  is  a between-subject  factor. 

Table  3-10.  Simple  Effect  ANOVA  examining  interaction  of  analyst  by  condition 
(normal)  for  NGGA. 


Source 

Type  III  Sum  of 
Squares 

df  Mean  Square  F P 

Analyst 

4.296 

2 2.148  100.590  0.000* 

Error  (analyst) 

8.541E-02 

4 2.135E-02 

* Statistically  significant  at  P < 0.025 


Follow  up  t-tests  were  run.  The  t-test  indicated  a significant  difference  between 
Analyst  1 and  Analyst  2 (tcriticai  0.05/2  (2,  2)  = 6.21,  t (2)  = 17.75),  between  Analyst  1 and 
Analyst  3 (tcnticai  0.05/2  (2,  2)  = 6.21,  t (2)  = 10.21),  and  between  Analyst  2 and  Analyst  3 
(tent, cal  0.05/2  (1,  2)  = 4.303,  t (2)  = 9.78). 

A Simple  Effect  ANOVA  was  also  run  on  the  interaction  of  analyst  by  condition 
for  the  pathological  condition.  Results  indicated  significant  differences  among  analysts,  F 
(184.85),  P = 0.000.  Table  3-1 1 shows  the  statistical  results. 

Follow  up  t-tests  were  run.  Results  indicated  a significant  difference  between 
Analyst  1 and  Analyst  2 (tcriticai  0.05/2  (1,  2)  = 4.303,  t (2)  = 4.53),  between  Analyst  1 and 


67 


Table  3-11.  Simple  Effect  ANOVA  examining  the  interaction  of  analyst  by  condition 
(pathological)  for  NGGA. 


Source 

Type  III  Sum  of 
Squares 

df  Mean  Square 

F 

P 

Analyst 
Error  (analyst) 

7.609 

8.233E-02 

2 3.805 

4 2.058E-02 

184.85 

0.000* 

* Statistically  significant  at  P < 0.025 

Analyst  3 (tcnucai  0.05/2 

(2,  2)  = 6.21,  t (2) 

=14. 17),  and  between  Analyst  2 and  Analyst  3 

(tcritical  0.05/2  (2,  2)  = 6.21,  t (2)  =14.10). 

Total  Vocal  Fold  Length  (TVFLl 
Analyst  by  Condition  Interaction 

A Simple  Effect  ANOVA  was  run  on  the  interaction  of  analyst  by  condition  for 
the  normal  condition.  The  Bonferroni  procedure  was  used  to  correct  for  Type  I error  (a  = 
0.05  / 2 = 0.025).  The  ANOVA  detected  a significant  difference  among  analysts,  F 
(147.885),  P = 0.000.  Table  3-12  shows  the  statistical  results  along  with  the 
corresponding  degrees  of  freedom.  Again,  It  should  be  noted  here  that  condition  is  a 
between-subject  factor. 

Follow  up  t-tests  were  run.  Results  indicated  a significant  difference  between 
Analyst  1 and  Analyst  2 (tcnticai  0.05/2  (2,  2)  = 6.21,  t (2)  =1 1.44),  between  Analyst  1 and 
Analyst  3 (Wicai  0.05/2  (2,  2)  = 6.21,  t (2)  =10.21),  but  not  between  Analyst  2 and  Analyst 
3 (tcritical  0.05/2  (E  2)  = 4.303,  t (2)  =2.25). 

A Simple  Effect  ANOVA  was  also  run  on  the  interaction  of  analyst  by  condition 
for  the  pathological  condition.  Results  indicated  significant  differences  among  analysts,  F 


68 


Table  3-12.  Simple  Effect  ANOVA  examining  the  interaction  of  analyst  by  condition 
(normal)  for  TVFL. 


Source 

Type  III  Sum  of 

df  Mean  Square  F P 

Squares 

Analyst 

18.958 

2 9.479  147.885  0.000* 

Error  (analyst) 

0.356 

4 6.410E-02 

* Statistically  significant  at  P < 0.025 

(1341.201),  P = 0.000.  Table  3-13  shows  the  statistical  results. 

Table  3-13.  Simple  Effect  ANOVA  examining  the  interaction  of  analyst  by  condition 
(pathological)  for  TVFL. 


Source 

Type  III  Sum  of 
Squares 

df  Mean  Square  F P 

Analyst 

41.402 

2 20.701  1341.201  0.000* 

Error  (analyst) 

6.174E-02 

4 1.543E-02 

* Statistically  significant  at  P < 0.025 


Follow  up  t-tests  were  run.  In  this  case,  results  of  the  t-tests  indicated  no 
significant  difference  between  Analyst  1 and  Analyst  2 (Critical  0.05/2  (1,2)  = 4.303,  t (2)  = 
1.52).  However,  there  were  significant  differences  between  Analyst  1 and  Analyst  3 
(tcriticai  0.0  5/  2 (2,  2)  = 6.2 1 , t (2)  = 48.08),  and  between  Analyst  2 and  Analyst  3 (tcriticai  0.05/2 
(2,  2)  = 6.21,  t (2)  =37.97). 

Program  by  Condition  Interaction 

A Simple  Effect  ANOVA  was  also  run  on  the  interaction  of  program  by  condition 
for  the  normal  condition.  Results  indicated  significant  differences  among  programs,  F 
(10.848),  P = 0.024.  Table  3-14  shows  the  statistical  results.  Follow  up  t-tests  were  run. 
Results  indicated  a significant  difference  between  Adobe  Photoshop  and  Scion  Image 


69 


Table  3-14.  Simple  Effect  ANOVA  examining  the  interaction  of  program  by  condition 
(normal)  for  TVFL. 


Source 

Type  III  Sum  of 
Squares 

df  Mean  Square  F P 

Program 

15.855 

2 7.928  10.848  0.024* 

Error  (program) 

2.923 

4 0.731 

* Statistically  significant  at  F < 0.025 


(tcriticai  0.05/2  (2,  2)  = 6.21,  t (2)  = 1 1.22).  Howcvcr,  no  significant  difference  was  detected 
between  Gimp  and  Scion  Image  (tcnticai  0.05/2  (2,  2)  = 6.21,  t (2)  = 3.89).  Also,  there  was  no 
significant  difference  detected  hQtwQtn  Adobe  Photoshop  and  Gimp  (tcriticai  0.05/2  (1,2)  = 
4.303,  t (2)  = 0.045). 

A Simple  Effect  ANOVA  was  also  run  on  the  interaction  of  program  by  condition 
for  the  pathological  condition.  Results  indicated  no  significant  differences  among 
programs,  F (0.459),  P = 0.661.  Table  3-15  shows  the  statistical  results. 


Table  3-15.  Simple  Effect  ANOVA  examining  the  interaction  of  program  by  condition 
(pathological)  for  TVFL. 


Source 

Type  III  Sum  of 
Squares 

df  Mean  Square  F P 

Program 

0.314 

2 0.157  0.459  0.661 

Error  (program) 

2.923 

4 0.342 

* Statistically  significant  at  F < 0.025 


Follow  up  t-tests  were  not  pursued  in  this  case  because  there  was  no  need  for 
further  analysis.  Although  the  interaction  of  program  by  condition  was  significant,  simple 
Effect  ANOVA  for  the  pathological  condition  was  not.  As  stated  above  the  Shaffer-Holm 
procedure  requires  significance  in  Simple  Effect  ANOVA  and  the  t-tests. 


70 


Program  by  Analyst  Interaction 

A Simple  Effect  ANOVA  was  run  on  the  interaction  of  program  by  analyst  to  test 
if  there  were  significant  differences  among  programs  for  Analyst  1,  Analyst  2 and 
Analyst  3.  The  Bonferroni  procedure  was  used  to  correct  for  Type  I error  (a  = 0.05  / 3 = 
0.0167).  The  ANOVA  detected  a significant  difference  among  programs  for  Analyst  1,  F 
(16.168),  P = 0.001. Table  3-16  shows  the  statistical  results. 

Table  3-16.  Simple  Effect  ANOVA  examining  the  interaction  of  program  by  Analyst  1 
for  TVFL. 


Source 

Type  III  Sum  of 
Squares 

df  Mean  Square  F P 

Analyst  1 

27.963 

2 13.981  16.168  0.001* 

Error  (analyst) 

0.206 

10  2.058E-02 

* Statistically  significant  at  P < 0.0167 

Follow  up  t-tests  were  run.  Results  indicated  no  significant  difference  between 
Adobe  Photoshop  and  Gimp  (tcriticai  o.  05/2  (3,  3)  - 3.534,  t (2)  = 0.08).  However,  there  was 
a significant  difference  between  Adobe  Photoshop  and  Scion  Image  (tcriticai  0.05/2  (3,3)  = 
3.534,  t (2)  = 4.608),  and  between  Gimp  and  Scion  (tcnticai  0.05/2,  (3,  3)  = 3.534,  t (2)  = 
3.94).  Although  the  ANOVA  detected  a significant  difference  among  programs  for 
Analyst  2,  F (20.155),  P = 0.013),  follow  up  t-tests  detected  no  significant  differences. 
This  was  acceptable  because  the  Shaffer-Holm  procedure  requires  detecting  a significant 
difference  in  both  the  Simple  Effect  ANOVA  as  well  as  follow  up  t-tests.  Table  3-17 
shows  the  statistical  results.  No  further  analysis  was  pursued  when  follow  up  t-tests 
results  indicated  no  signifieant  difference  between  Adobe  Photoshop  and  Gimp  (tcnticai 
0.05/2  (3,  3)  = 3.534,  t =0. 19),  between  Photoshop  and  Scion  Image  (Critical  0.05/2  (3,  3)  = 


71 


3.534,  t (2)  = 3.16),  and  between  Gimp  and  Scion  Image  (tcriticai  0.05/2  (3,  3)  = 3.534,  t (2)  = 
2.59).  No  significant  differences  among  programs  were  detected  by  the  Simple  Effect 
ANOVA  for  Analyst  3,  F (2.645),  P = 0. 120.  Therefore,  further  analysis  was  not  pursued. 
Table  3-18  shows  the  statistical  results. 


Table  3-17.  Simple  Effect  ANOVA  examining  the  interaction  of  program  by  Analyst  2 
for  TVFL. 


Source 

Type  III  Sum  of 

df 

Mean  Square  F P 

Squares 

Analyst  2 

40.310 

2 

20.155  20.155  0.013* 

Error  (analyst) 

0.206 

10 

7.971E-02 

* Statistically  significant  aXP  < 0.0167 


Table  3-18.  Simple  Effect  ANOVA  examining  the  interaction  of  program  by  Analyst  3 
for  TVFL. 


Source 

Type  III  Sum  of 
Squares 

df  Mean  Square  F P 

Analyst  3 

13.533 

2 6.766  2.645  0.120* 

Error  (analyst) 

25.578 

10  2.558 

* Statistically  significant  at  P < 0.0167 


CHAPTER  4 
DISCUSSION 

Three  imaging  software  packages  were  examined  for  their  consistency  in 
measuring  laryngeal  dimensions.  S>^QciT\cdi\\y,  Adobe  Photoshop  5.5,  Gimp  version 
0.99.X/1.0  and  Scion  Image  Beta  4.2  were  examined  for  their  inter-program  and  inter- 
analyst consistency  in  the  measurement  of  normalized  glottal  gap  area  (NGGA),  and  total 
vocal  fold  length  (TVFL).  Inter-program  consistency  was  examined  by  testing  the 
differences  among  the  three  software  packages  mentioned  above.  The  variability  of  each 
program  was  also  assessed.  Inter-analyst  consistency  was  examined  by  testing  the 
differences  and  examining  the  variability  of  measurements  across  three  analysts. 
Endoscopic  image  frames  obtained  from  a subject  with  normal  voice  production  and  a 
patient  with  unilateral  adductor  vocal  fold  paralysis  producing  sustained  vowels  were 
used  as  the  samples  for  analysis. 

The  findings  are  discussed  in  order  of  the  experimental  hypothesis  posed.  First,  it 
was  hypothesized  that  there  would  be  no  significant  inter-program  differences  or 
variability  in  the  measurement  of  NGGA  and  TVFL  in  the  normal  and  pathological 
larynx.  Second,  it  was  hypothesized  that  there  would  be  no  significant  inter-analyst 
differences  or  variability  for  NGGA  and  TVFL  within  each  of  the  three  programs  for 
both  the  normal  and  pathological  larynx.  For  each  hypothesis,  the  discussion  that  follows 
includes  related  studies  previously  reported  in  the  review  of  literature,  interpretations 
drawn  from  the  statistical  analysis,  and  possible  explanations  for  the  patterns  observed. 


72 


73 


Normalized  Glottal  Gap  Area  (NGGA) 


Effect  of  Condition 

The  statistical  analysis  indicated  a significant  main  effect  for  condition.  Although 
the  effect  of  condition  would  not  be  traditionally  analyzed  separately  (because  of  the 
existence  of  analyst  by  condition  interaction),  the  significant  mean  differences  between 
normal  images  and  pathological  images  were  expected  because  the  effect  of  condition 
differentiates  between  the  normal  and  the  pathological  larynx.  When  a laryngeal 
pathology  is  present,  the  structural  dimensions  of  the  larynx  are  altered  (increase  in  mass, 
decrease  in  length,  etc.).  In  a vocal  fold  paralysis  case,  the  paralyzed  vocal  fold  length 
changes  because  of  changes  in  elasticity.  The  intact  vocal  fold  is  also  affected  because  of 
the  functional  unity  of  both  vocal  folds.  Naturally,  these  changes  are  reflected  on  the 
resulting  image  frames.  Additionally,  an  endoscopic  image  frame  of  a pathological  larynx 
is  typically  larger  in  size  than  an  endoscopic  frame  of  a normal  larynx.  Clinicians  tend  to 
prefer  magnified  views  of  a pathological  larynx  to  obtain  as  much  visual  information  as 
possible  because  details  are  crucial  in  the  assessment  of  laryngeal  pathology.  Thus, 
different  image  sizes  are  obtained.  When  scoping  a normal  larynx,  and  to  decrease 
subject  discomfort,  clinicians  tend  to  be  satisfied  with  an  overview  image  that  provides  a 
general  idea  about  the  structure  and  function  of  the  larynx  in  question.  It  is  not  surprising, 
then,  for  programs  and  analysts  to  obtain  larger  means  for  NGGA  of  pathological  frames 
as  compared  to  normal  frames. 

The  standard  deviations  for  the  programs  were  listed  in  Table  3-7.  The  results  for 
the  normal  image  frames  using  Adobe  Photoshop  and  Scion  Image  showed  larger 
standard  deviations  (an  indication  of  more  variability)  than  those  associated  with  the 


74 


pathological  image  frames.  In  general,  the  pathological  frames  used  in  this  study 
provided  better  visibility  than  the  normal  images  for  the  measurement  of  glottal  gap  area. 
The  pathological  frames  were  larger  in  size  than  the  normal  frames.  Both  the  normal  and 
pathological  frames  were  enlarged  to  400%  of  their  original  size.  This  process  resulted  in 
the  pathological  frames  being  larger  and  more  visible  than  the  normal  frames.  However, 
Gimp  produced  larger  standard  deviations  for  the  normal  frames  than  the  pathological 
frames.  The  effect  of  the  analyst  might  also  have  been  present  (because  an  analyst  by 
condition  interaction  was  significant).  The  differences  among  the  results  obtained  by  the 
three  analysts,  as  they  interacted  with  the  condition  measured,  may  have  contributed  to 
the  condition-related  differences  of  the  resulting  measurements  among  the  three 
programs. 

As  long  as  there  is  no  standardized  clinical  procedure  for  imaging  normal  and 
pathological  larynges,  differences  between  measuring  the  laryngeal  dimensions  from  the 
resulting  image  frames  should  be  expected. 

Effeet  of  Program 

Differences  existed  among  results  produced  by  each  of  the  three  programs  {Adobe 
Photoshop,  Gimp  and  Scion  Image)  in  the  measurement  of  NGGA.  The  three  programs 
failed  to  produce  consistent  results  regardless  of  the  analysts,  their  level  of  experience 
and  the  eondition  measured.  Thus,  the  first  part  of  the  first  null  hypothesis  was  rejected. 
Namely,  there  were  significant  inter-program  differences  in  measuring  glottal  gap  area 
for  both  conditions.  The  largest  mean  for  NGGA  was  produced  by  Gimp,  followed  by 
Adobe  Photoshop,  while  Scion  Image  produced  the  smallest  mean.  The  standard 
deviations  calculated  for  the  three  programs  (Table  3-7)  provide  an  indication  as  to  which 


75 


program  produced  more  variability  in  the  measurement  of  NGGA.  For  the  normal 
condition,  Adobe  Photoshop  results  were  the  most  variable,  whereas  both  Scion  Image 
and  Gimp  results  were  similarly  less  variable.  For  the  pathological  condition.  Gimp 
results  were  the  most  variable,  followed  by  Adobe  Photoshop  and  Scion  Image.  Thus,  the 
second  part  of  the  first  hypothesis  was  rejected.  Namely,  there  was  inter-program 
variability  when  measuring  glottal  gap  area  for  both  conditions.  This  was  an  interesting 
result,  because  Scion  Image  (and  its  NIH  Image  Macintosh  counterpart)  is  most 
commonly  used  by  speech  pathologists  and  voice  scientists.  Examples  of  studies  using 
NIH  Image  {Scion  Image  counterpart)  include  Omori  et  al.  (1996,  1998)  and  Inagi  et  al. 
(1997).  Although  Scion  Image  had  other  problems  that  are  discussed  later  in  this  chapter, 
it  seems  that  it  was  the  program  with  the  least  variable  results.  This  could  be  due,  in  part, 
to  the  fact  that  Scion  Image  was  originally  designed  by  The  National  Institutes  of  Health 
(NIH)  and  has  been  mostly  used  for  clinical  purposes,  while  Adobe  Photoshop  and  Gimp 
were  designed  for  web  design  purposes,  and  were  not  specifically  designed  for  clinical 
digital  image  analysis.  This  was  reflected  in  the  fact  that  Scion  Image  was  more  user 
friendly  than  Photoshop  and  Gimp  for  GGA  and  TVFL  measurement.  This  user 
friendliness  may  have  reduced  variability  in  these  measurements.  For  example,  using  the 
Measure  tab  in  Scion  Image  spared  the  analysts  the  relatively  time-consuming  task  of 
generating  an  image  histogram.  A histogram  had  to  be  generated  when  using  Adobe 
Photoshop  and  Gimp  to  measure  NGGA. 

One  of  the  studies  that  utilized  NIH  Image  (Macintosh  version  of  Scion  Image)  in 
digital  image  analysis  was  Omori  et  al.  (1998).  Omori  et  al.  (1998)  reported  an  NGGA 
mean  of  6.8  pixels  for  a group  of  20  patients  with  vocal  fold  paralysis.  The  reported 


76 


mean,  however,  cannot  be  used  for  the  purpose  of  comparison  with  the  current  study, 
because  the  set  of  image  frames  used  was  different  and  the  normalization  formula  used  in 
both  studies  did  not  successfully  control  for  endoscopic  vertical  movement. 

However,  to  compare  variability  among  results  obtained  by  the  current  study  and 
those  obtained  by  Omori  et  al.  (1998),  a Coefficient  of  Variation  (SD/  X *100)  was 
calculated.  The  Coefficient  of  Variation  provides  a way  to  compare  the  variability  of  two 
different  data  distributions  as  it  provides  a relative  measure  of  data  dispersion  compared 
to  the  mean.  The  calculated  Coefficient  of  Variation  for  Omori  et  al.  (1998)  was  69.12.  In 
comparison,  a much  smaller  Coefficient  of  Variation  was  calculated  for  the  current  study 
(4.25).  It  is  believed  that  the  large  Coefficient  of  Variation  for  NGGA  in  Omori  et  al. 
(1998)  for  the  vocal  fold  paralysis  group  was  due  to  the  manual  glottal  area  tracing  and 
the  lack  of  automated  edge  detection  procedures.  This  issue  will  subsequently,  be 
discussed  in  detail. 

Inagi  et  al.  (1997)  used  NIH  Image  {Scion  Image  counterpart)  to  measure  three 
pathological  endoscopic  frames  for  each  of  their  43  patients  (20  males  and  23  females). 
One  of  their  measures  was  "the  objective  measurement  of  glottal  gap"  (p.784).  They 
obtained  the  measurements  while  their  subjects  had  full  adduction  of  the  vocal  folds 
during  modal  phonation,  full  abduction  during  maximum  voluntary  inspiration,  and  at 
resting  position.  Four  reference  points  were  determined  for  each  glottal  frame:  anterior 
commissure,  paralyzed  side  vocal  process,  normal  side  vocal  process,  and  midarytenoid 
point.  They  stated  that  the  midarytenoid  point  was  determined  after  reviewing  vocal  fold 
movement  patterns  to  exclude  bias  resulting  from  the  position  of  the  paralyzed  fold.  The 


77 


investigators  thus  depended  on  their  judgment  in  determining  the  midarytenoid  point.  In 
fact,  all  four  reference  points  were  determined  based  on  analyst  judgment.  This  method, 
while  useful  as  an  introductory  technique,  failed  to  recognize  the  importance  of  the 
complex  nature  of  the  vocal  folds'  edges  and  the  relative  nature  of  the  reference  points. 
While  determining  the  points  of  the  anterior  commissure  and  the  tip  of  the  arytenoids 
may  be  acceptable,  a reference  point  such  as  the  midarytenoid  point  is  difficult,  if  not 
impossible,  to  determine  because  it  is  based  on  vocal  fold  length  measurement.  A 
Coefficient  of  Variation  could  not  be  obtained  for  the  Inagi  et  al.  (1997)  results  because 
the  main  interest  of  their  study  was  to  use  the  glottal  area  measurements  for  the  purpose 
of  correlation  with  vocal  functions.  Thus,  correlation  statistical  results  were  reported  and 
normalized  glottal  area  measurements  were  not. 

Woo  (1996)  used  an  improved  technique  to  measure  various  glottal  area 
waveform  parameters  using  commercially  available  image  analysis  software  {Bioscan). 
Woo  reported  a measure  of  peak  glottal  area  value  (the  largest  area  value  achieved  during 
the  glottal  open  phase)  across  four  modes  of  phonation:  modal,  high,  low  and  loud.  For 
the  modal  phonation  produced  by  male  subjects  peak  glottal  area  was  5.4  with  a standard 
deviation  of  2 (Coefficient  of  Variation  = 37.04),  and  for  female  subjects  producing 
modal  phonation,  the  mean  was  4.3  with  a standard  deviation  of  0.13  (Coefficient  of 
Variation  = 3.02).  The  current  study  obtained  Coefficients  of  Variation  of  7.54  using 
Scion  Image,  4.69  using  Gimp,  and  7.36  Adobe  Photoshop  (Coefficient  of  Variation 
for  collective  data  of  the  three  programs  = 6.25).  It  should  be  noted  that  the  current  study 
used  modal  phonation  without  consideration  to  the  sex  of  the  subjects.  While  the 
Coefficient  of  Variation  for  females  in  Woo  (1996)  was  lower  than  the  Coefficients  of 


78 

Variation  reported  by  the  current  study,  the  Coefficient  of  Variation  for  males  in  Woo 
(1996)  study  was  much  higher  than  the  Coefficients  of  Variation  reported  by  the  current 
study. 

Although  Woo  (1996)  used  automated  tracing  of  the  glottal  gap,  his  method 
required  manual  tracing  for  some  areas  where  light  and  shadow  effects  were  prominent. 
Examples  where  these  areas  were  present  were  the  anterior  and  posterior  commissure. 
Points  of  reference  used  by  Woo  (1996)  were  manually  determined  and  used  to  calculate 
vocal  fold  length,  which  was  used  to  normalize  the  glottal  gap  area.  Manual  tracing  was 
also  necessary  to  remove  the  effects  of  mucus  stranding.  Thus,  Woo's  technique  was  not 
fully  automated  and  depended  on  the  analyst's  skill  to  a certain  degree.  Other 
environmental  factors,  such  as  analyst  to  screen  distance,  head  movement,  user  fatigue, 
and  measurement  time  were  not  controlled,  or  at  least  not  reported  on.  The  multipixel 
width  of  the  vocal  fold  boundaries  was  not  addressed.  An  attempt  to  control  all  of  these 
factors  was  done  in  the  current  study,  and  may  account  for  the  differences  in  the  results 
discussed  above.  These  factors  are  important  issues  pertaining  to  measurement. 
Addressing  them  is  essential  to  achieve  that  goal. 

Effect  of  Analyst  by  Condition  Interaction 

A significant  interaction  of  analyst  by  condition  existed.  Post-hoc  testing  revealed 
that  the  results  obtained  by  all  analysts  contributed  to  the  inconsistent  results.  The  three 
analysts  differed  significantly  across  both  conditions.  Thus,  the  first  part  of  the  second 
hypothesis  was  rejected.  Namely,  there  were  inter-analyst  differences  in  measuring 
glottal  gap  area  for  both  conditions.  It  should  be  noted,  however,  that  the  means  and 
standard  deviations  obtained  by  Analyst  1 (expert)  and  Analyst  2 (naive)  were  lower,  as 


79 


well  as  more  in  agreement,  than  Analyst  3 (expert).  Overall,  Analyst  3 obtained  higher 
standard  deviations  for  NGGA  than  Analyst  1 and  Analyst  2 across  programs,  conditions, 
and  frames.  Thus,  the  second  part  of  the  second  hypothesis  was  rejected.  Namely,  there 
was  variability  among  analysts  for  the  measurement  of  glottal  gap  area  for  both 
conditions. 

The  studies  that  investigated  multiple  measurers  or  raters  provide  evidence 
clarifying  inter-rater  differences.  The  findings  of  such  studies  indicated  that  regardless  of 
the  measurers/raters  level  of  experience,  inter-rater  consistency  is  low  (Kreiman  et  al. 
1992,  Kreiman  & Gerratt  1998).  In  fact,  Kreiman  et  al.  (1990)  concluded  that  training 
and  experience  cause  individuals  to  differ  more,  not  less,  in  how  they  perceive  vocal 
quality.  Human  visual  perception  is  as  equally  vulnerable  to  bias  as  is  auditory 
perception  (Kent  1996).  Measuring  the  glottal  gap  area  by  manual  tracing  has  a 
considerable  amount  of  dependency  on  the  analyst's  visual  perception,  preconceptions  of 
how  to  define  the  glottal  area,  and  visual  motor  coordination.  The  analyst's  visual  motor 
coordination  is  an  important  factor  to  consider,  and  might  have  affected  inter-analyst 
consistency  during  the  manual  tracing  of  the  glottal  gap.  It  is  documented  that  there  is  a 
timing  discrepancy  between  eye  movement  and  hand  movement  when  a user  is  moving  a 
cursor  on  video  display  from  a starting  point  to  a target  point.  Eye  movement  is  usually 
faster  than  hand  movement.  When  the  hand  reaches  the  target,  another  period  of  time 
elapses  while  the  hand  is  engaged  in  error  correction.  Sometimes,  error  correction  does 
not  happen  (Abrams  1992). 

Manual  tracing  of  the  glottal  area  can  also  be  affected  by  optical  illusion.  Visual 
information  is  not  always  accurate  because  the  brain  relies  on  computational  processes  to 


80 


fill  in  missing  visual  information.  For  example,  when  the  human  visual  system 
encounters  a bright  area,  it  automatically  searches  for  edges  in  order  to  discriminate 
among  similar  brightness  areas.  As  a result,  the  visual  system  discards  gradual  changes  in 
brightness  that  do  not  reach  the  threshold  of  brightness  set  by  the  visual  system  to 
establish  a geometric  shape  (Frisby  1979,  Russ  1992).  Another  characteristic  of  the 
human  visual  system  is  its  automatic  tendency  to  group  parts  within  an  image  together  so 
the  brain  can  make  visual  sense  of  the  presented  visual  information.  Also,  the  visual 
system  tends  to  arrange  different  objects  in  the  same  orientation  in  order  to  compare 
and/or  group  them  (Russ  1992). 

These  characteristics  of  the  visual  system  allow  for  a number  of  visual  illusions, 
which  occur  when  the  brain  is  misled  by  an  unusual  stimulus.  For  example,  a well- 
documented  illusion  is  the  Fraser's  spiral,  which  is  essentially  a two  dimensional  drawing 
that  consists  of  concentric  circles.  These  circles  give  the  illusion  of  a three  dimensional 
spiral.  However,  the  image  of  a spiral  resides  only  in  the  viewer's  brain.  This  illusion  is 
so  powerful  that  it  misleads  finger  movement,  if  one  tries  to  trace  the  spiral  (Frisby 
1979).  What  we  see  sometimes  differs  dramatically  from  what  is  actually  before  our  eyes. 

Contrary  to  general  belief,  training  does  not  always  improve  analysts' 
performance.  In  fact,  it  was  found  that  trained  analysts  rely  more  on  visual  information  to 
control  their  hand  movement  than  the  untrained  analysts  (Protean  1992).  This  makes  the 
whole  process  even  more  visually  dependent. 

In  conclusion,  the  total  dependency  on  the  human  factor,  manifested  by  manual 
tracing  of  laryngeal  structures,  should  be  reduced  to  the  minimum.  An  automated  or 
semi-automated  technique  for  measuring  laryngeal  dimensions  is  advocated.  Such 


81 


dependency  accompanied  by  the  above-discussed  lack  of  clinical  procedures  in  obtaining 
normal  and  pathological  images  of  the  larynx  may  substantially  increase  the  potential  of 
measurement  error. 


Total  Vocal  Fold  Length  (TVFLl 


Effect  of  Condition 

The  statistical  analysis  indicated  a significant  main  effect  for  condition.  Although 
the  effect  of  condition  would  not  be  traditionally  analyzed  separately  (because  of  the 
existence  of  program  by  condition  and  analyst  by  condition  interactions),  the  significant 
mean  differences  between  normal  images  and  pathological  images  were  expected  because 
the  effect  of  condition  differentiates  between  the  normal  and  the  pathological  larynx.  This 
differentiation  is  discussed  above  (for  glottal  gap  area)  and  applies  to  the  total  vocal  fold 
length.  The  normal  frames  had  a smaller  standard  deviation  than  the  pathological  frames. 
According  to  the  analysts,  it  was  easier  to  measure  TVFL  in  normal  frames  than  it  was  in 
pathological  frames  because  of  the  relative  ease  of  locating  the  two  points  used  in  this 
measurement.  These  were  the  point  of  the  anterior  commissure  and  the  point  of  the  vocal 
process  of  the  vocal  fold.  The  larger  standard  deviation  for  the  pathological  frames 
measurement  may  have  been  caused  by  the  arytenoid  of  the  normal  vocal  fold  side 
obscuring  the  point  of  the  vocal  process  of  the  paralyzed  side,  thus  causing  poor  visibility 
of  the  vocal  process  point  in  the  paralyzed  vocal  fold. 

Effect  of  Program  by  Analyst  Interaction 

Post  hoc  testing  for  the  interaction  of  program  by  Analyst  1 showed  that  while 
Adobe  Photoshop  and  Gimp  produced  similar  data,  Adobe  Photoshop  and  Scion  Image, 


82 


and  Gimp  and  Scion  Image  did  not.  Thus,  the  first  part  of  the  second  hypothesis  was 
rejected.  Namely,  there  were  significant  inter-program  differences  when  measuring 
TVFL  for  both  conditions.  When  Analyst  1 was  using  Scion  Image  to  measure  TVFL,  the 
result  was  a higher  mean  than  when  he  was  using  both  Adobe  Photoshop  and  Gimp. 

Thus,  the  cause  of  the  differences  among  programs  when  Analyst  1 was  measuring  TVFL 
seems  to  be  Scion  Image.  This  was  not  the  case  when  Analyst  2 or  Analyst  3 was 
measuring  TVFL  because  the  three  programs  produced  similar  data. 

Overall,  the  cause  of  the  inconsistencies  seems  to  be  Scion  Image,  only  when 
Analyst  1 was  making  measurements.  The  potential  cause  for  such  inconsistencies  may 
be  the  differences  among  programs  in  terms  of  image  display.  Scion  Image  had  lower 
image  visibility  than  Adobe  Photoshop  and  Gimp.  Both  Adobe  Photoshop  and  Gimp 
display  grayscale  images  with  pixel  brightness  values  ranging  from  zero  to  255,  with  the 
value  of  zero  pixels  as  black  and  255  as  white.  However,  Scion  Image  displays  the  value 
of  zero  pixels  as  white  and  the  value  of  255  pixels  as  black.  This  caused  images  displayed 
by  Scion  Image  to  appear  brighter  than  those  displayed  by  Adobe  Photoshop  and  Gimp, 
which,  in  turn  contributed  to  the  interaction  between  program  and  analyst.  The  manual 
tracing  of  the  total  vocal  fold  length  is  another  potential  factor.  Although  the  images  were 
enlarged  to  400%,  the  visual  motor  coordination  and  the  possibility  of  optical  illusion 
may  have  played  an  important  role  in  inter-analyst  consistency. 

The  standard  deviation  produced  by  Scion  Image  were  lower  than  those  produced 
by  Adobe  Photoshop  and  Gimp  across  both  conditions.  Thus,  the  second  part  of  the 
second  hypothesis  was  rejected.  Namely,  there  was  inter-program  variability  when 
measuring  TVFL  for  both  conditions. 


83 


Effect  of  Program  by  Condition  Interaction 

Post  hoc  testing  for  the  interaction  of  program  by  condition  indicated  that  Scion 
Image  and  Adobe  Photoshop  measured  TVFL  differently  when  normal  frames  were 
examined,  where  Scion  Image  produced  a higher  mean  than  Adobe  Photoshop.  On  the 
other  hand,  Gimp  and  Scion  Image  measured  images  similarly  for  TVFL  when  the  normal 
frames  were  examined.  Also,  Gimp  and  Adobe  Photoshop  produced  similar  results  for 
TVFL  when  the  normal  frames  were  examined.  Adobe  Photoshop,  Gimp  and  Scion  Image 
produced  similar  results  when  examining  the  pathological  frames.  Thus,  post  hoc  testing 
indicated  that  the  pathological  condition  did  not  contribute  to  the  differences  among 
programs  when  measuring  TVFL.  Overall,  it  seems  that  the  cause  of  the  inconsistencies 
was  using  Scion  Image  when  measuring  the  normal  frames.  A possible  cause  may  have 
been  related  to  increased  brightness  of  the  normal  images  displayed  by  Scion  Image. 
Manual  location  of  TVFL  points,  differences  in  eye-hand  coordination,  and  the  potential 
optical  illusion  are  also  possible  factors. 

Effect  of  Analyst  by  Condition  Interaction 

Post  hoc  testing  for  the  interaction  of  analyst  by  condition  indicated  that  there  are 
significant  inter-analyst  differences  in  measuring  TVFL  for  both  conditions.  Thus  the 
first  part  of  the  second  hypothesis  was  rejected. 

Post  hoc  testing  of  analyst  differences  at  the  normal  condition  revealed  that 
Analyst  1 and  Analyst  2 produced  different  results  when  normal  image  frames  were 
examined,  with  Analyst  1 obtaining  a higher  mean  than  Analyst  2.  Also,  Analyst  1 
produced  different  results  than  Analyst  3 when  normal  image  frames  were  examined, 
with  Analyst  1 obtaining  a higher  mean  than  Analyst  3.  However,  Analyst  2 and  Analyst 


84 


3 produced  similar  results  when  normal  image  frames  were  examined.  When  pathological 
image  frames  were  examined,  Analyst  1 and  Analyst  3 produced  different  results,  with 
Analyst  1 obtaining  a higher  mean  than  Analyst  3.  Analyst  2 and  Analyst  3 produced 
different  results,  with  Analyst  2 obtaining  a higher  mean  than  Analyst  3.  However, 
Analyst  1 and  Analyst  2 produced  similar  results. 

Overall,  for  the  normal  condition,  it  seems  that  Analyst  1 (expert)  was  a main 
factor  in  producing  the  inconsistent  results  among  the  analysts.  For  the  pathological 
condition  it  seems  that  Analyst  3 (expert)  was  a main  factor  in  producing  the  ineonsistent 
results  among  the  analysts.  Aeeording  to  Kreiman  et  al.  (1992,  1998)  and  Kent  (1996), 
discrepaneies  between  the  performanee  of  Analyst  1 and  Analyst  3 are  expected  to  be 
large  espeeially  because  both  are  expert  raters.  The  more  experienced  raters  beeome,  the 
more  likely  differences  in  their  performanee  beeomes.  Experienee  probably  causes 
continuous  change  of  the  criteria  used  in  defining  the  laryngeal  structures  and  evaluating 
voeal  function.  These  criteria  are  influenced  by  the  differences  in  the  professional 
training  analysts  reeeive,  the  differences  in  clinical  and  research  practice,  and  the 
differences  in  therapy  orientation. 

The  standard  deviations  obtained  by  the  three  analysts  across  the  three  programs 
and  the  two  conditions  were  presented  in  Tables  3-1  through  3-6.  Analyst  3 obtained 
lower  standard  deviation  for  TVFL  across  programs,  conditions  and  frames  than  both 
Analyst  1 and  Analyst  2.  Thus,  the  seeond  part  of  the  second  hypothesis  was  rejeeted. 
Namely,  there  was  inter-analyst  variability  when  measuring  TVFL  for  both  eonditions. 

The  inconsistency  of  manual  tracing,  along  with  its  vulnerability  to  visual  motor 
coordination  and  optical  illusion,  applies  to  the  measurement  of  the  TVFL.  TVFL  was 


85 


measured  by  defining  the  anterior  commissure  and  posterior  commissure  manually. 
Replacing  manual  tracing  of  laryngeal  structures  with  automated  tracing  of  laryngeal 
structures  would  reduce  the  dependency  on  the  analysts,  their  level  of  experience  and  the 
resulting  biases. 

Thus,  it  can  be  concluded  that  digital  image  analysis  of  the  larynx  at  its  current 
status  cannot  produce  consistent  results  that  could  be  used  for  clinical  or  research 
proposes. 


General  Observations 

Measuring  the  glottal  gap  area  and  total  vocal  fold  length  by  manual  tracing  has  a 
considerable  amount  of  dependency  on  the  visual  perception  of  the  analyst.  Although  a 
fixed  head  position  enhances  the  accuracy  of  eye-hand  coordination  (Carnahan  1992), 
manual  tracing  of  the  glottal  gap  area  is  influenced  by  the  analysts'  visual  motor 
coordination.  The  timing  discrepancy  between  eye  movement  and  hand  movement  and 
the  lack  of  hand  error  correction  seem  to  be  important  factors  that  may  have  affected  the 
results  of  the  current  study. 

Manual  tracing  of  the  glottal  area  and  the  vocal  fold  length  can  also  be  affected  by 
optical  illusion.  The  characteristics  of  the  human  visual  system  make  it  vulnerable  to  an 
unpredictable  number  of  such  illusions.  What  we  see  sometimes  differs  dramatically  from 
what  is  actually  before  our  eyes. 

Contrary  to  the  general  belief,  training  does  not  always  improve  analysts' 
performance.  Trained  analysts  rely  more  on  visual  information  to  control  their  hand 
movement  than  the  untrained  analysts,  which  make  the  whole  process  even  more 
vulnerable  to  errors  resulting  from  dependency  on  the  human  visual  system. 


86 


Issues  related  to  program  user  friendliness  and  image  display  are  also  important 
factors  to  consider  when  selecting  the  program  to  use  for  analysis  purposes.  It  was 
observed  that  Adobe  Photoshop  and  Scion  Image  were  more  user-friendly  than  Gimp. 
When  using  Adobe  Photoshop,  the  analysts  obtained  measurements  with  greater  ease.  For 
example,  in  the  measurement  of  the  glottal  gap  area,  the  analysts  reported  that  Adobe 
Photoshop  permitted  generating  an  image  histogram  for  the  manually  traced  area  with 
relative  ease.  The  histograms  generated  by  Adobe  Photoshop  and  Scion  Image  have  the 
ability  to  depict  only  the  selected  traced  glottal  gap  area.  The  histogram  generated  by 
Gimp,  on  the  other  hand,  depicted  the  whole  image.  A histogram  of  the  traced  area 
couldn  t be  generated  unless  the  image  was  converted  to  RGB.  Thus,  the  analysts  had  to 
copy  the  traced  area,  and  then  paste  it  into  a newly  created  image  in  order  to  obtain  its 
histogram,  a relatively  time-consuming  process.  Scion  Image  was  also  relatively  easy  to 
use.  It  provided  the  analysts  with  a "measure"  tab  that  spared  them  the  necessity  of 
creating  an  image  histogram. 

The  fact  that  Scion  Image  had  lower  image  visibility  than  Adobe  Photoshop  and 
Gimp  due  to  differences  in  displaying  black  and  white  pixel  values  also  contributed  to  the 
reported  inconsistencies.  Brightness  levels  may  have  concealed  some  image  details,  such 
as  edges.  When  viewed  in  combination  with  visual  motor  coordination  and  optical 
illusion,  an  increase  in  measurement  error  potential  is  expected. 

A 21 -inch  screen  was  used  to  display  the  images  because  they  were  enlarged  to 
400%  of  their  original  size  to  facilitate  detecting  the  multi-pixel  boundary  of  the  glottal 
gap  boundaries.  However,  this  effect  created  difficulty  in  displaying  the  image  at  the 
center  of  the  screen  using  Scion  Image.  Scion  Image  had  a smaller  viewing  area  than  the 


87 

other  two  programs  and  the  image  was  not  automatically  centralized  on  the  computer 
screen.  Such  difficulty  was  not  encountered  when  Adobe  Photoshop  or  Gimp  were  used. 

Clinical  Implications 

The  reported  discrepancy  among  the  programs  is  an  important  factor  that 
affects  the  clinical  assessment  and  interpretation  of  findings  of  normal  and 
pathological  laryngeal  frames.  For  example,  being  able  to  accurately  document  the 
change  in  glottal  gap  area  in  a vocal  fold  paralysis  case  provides  the  surgeon  and 
the  treatment  team  with  objective  evidence  that  the  procedure  has  successfully 
closed  the  glottal  gap.  Data  such  as  this  is  meaningful  for  documenting  the 
functional  outcomes  from  a patient  perspective,  medical  perspective,  and  third  party 
payer  perspective.  However,  with  the  inconsistency  of  the  investigated  programs 
found  by  the  current  study,  the  use  of  digital  image  analysis  programs  to  quantify 
endoscopic  images  of  the  larynx  is  far  from  being  clinically  useable.  The 
inconsistencies  reported  among  analysts  in  this  study  are  also  a concern.  With  such 
inconsistencies,  reliance  on  digital  image  analysis  of  the  larynx  would  lead  to  large 
discrepancies  in  outcome  measures.  When  programs  or  instruments  are  employed  in 
outcome  assessment,  clinicians,  third  party  payers,  and  most  importantly  patients, 
expect  objective  consistent  outcome  data.  For  clinicians  to  objectively  demonstrate 
the  effieacy  of  an  intervention  strategy  to  both  third  party  payers  and  patients,  the 
instruments  or  programs  used  need  to  comply  to  certain  standards.  Some  of  these 
are  validity,  consistency,  practicality,  sensitivity  and  comprehensibility.  Validity  is 
related  to  the  technieal  ability  of  the  program  used  to  measure  what  it  is  supposed  to 
measure.  Consistency  means  that  the  program  should  yield  reliable  results  across 


88 


time  by  different  analysts  or  examiners.  Although  the  chance  of  error  cannot  be 
totally  eliminated  in  this  case,  a consistent  program  should  yield  only  slight  errors 
that  can  be  identified  and  accounted  for.  A program  should  be  practical.  Practicality 
can  be  determined  in  terms  of  user  friendliness,  amount  of  analyst  training  required, 
amount  of  time  to  administer  measurements,  cost,  etc.  The  program  should  also  be 
sensitive.  This  means  that  a program  should  have  the  ability  to  detect  small  but 
meaningful  increments  of  change  that  serve  its  purpose.  For  example,  a program 
used  to  measure  the  change  in  glottal  gap  area  pre  and  post  thyroplasty  should  be 
sensitive  enough  to  detect  small  amounts  of  change  in  the  glottal  gap  because  those 
changes  may  affect  the  patient's  vocal  quality.  Comprehensibility  is  related  to  the 
clarity  and  understandability  of  produced  data  to  the  end  user  (Frattali  1998). 

The  digital  image  analysis  programs  investigated  in  this  study  cannot  be 
clinically  employed  for  accurately  documenting  the  status  of  the  larynx.  Validity  is 
a multifaceted  concept  (e.g.,  construct  validity,  face  validity,  etc.),  however,  a 
program  cannot  be  considered  valid  if  it  is  not  consistent.  Because  the  programs 
used  did  not  yield  consistent  data  across  all  analysts,  their  validity  is  questioned. 
Also,  there  were  some  practical  difficulties  the  analysts  encountered  while 
obtaining  measurements.  One  of  those  is  the  lack  of  standardized  built  in  program 
procedures  for  measurement  of  NGGA  and  TVFL  across  the  three  programs.  While 
Scion  Image  provided  the  analysts  with  a Measure  tab  to  measure  glottal  gap  area, 
AdobQ  Photoshop  and  Gimp  did  not.  A histogram  had  to  be  generated  by  Adobe 
Photoshop  and  Gimp  for  that  purpose.  Comprehensibility  was  also  an  issue  because 
the  inter-program  and  inter-analyst  differences  and  variability  existed.  If  these 


89 


programs  are  used  clinieally,  the  generated  data  will  probably  not  make  "clinical 
sense  . A possible  situation  in  a clinical  setting  might  be  where  two  expert  analysts 
use  Scion  Image,  for  example,  to  objectively  document  glottal  gap  measurement  for 
a patients  larynx  post  thyroplasty.  Legitimate  questions  as  to  which  expert  analyst’s 
results  should  be  documented  may  arise.  The  potential  outcome  is  also  clinically 
unstable  and  cannot  be  presented,  if  at  all,  to  the  patient  and/or  the  third  party  payer. 

The  outcome  data  generated  can  be  clinically  misleading.  Although  sensitivity  of 
the  programs  was  not  tested  in  this  study,  because  it  was  not  within  its  scope,  it  is 
probable  that  attempting  to  measure  the  change  pre  and  post  thyroplasty  would 
create  further  difficulties. 

Although  Scion  Image  emerged  as  the  least  variable  and  the  most  user- 
friendly,  caution  should  be  applied  when  interpreting  its  results.  As  mentioned 
above.  Scion  Image  is  susceptible  to  potential  error  resulting  from  a number  of 
factors:  the  effect  of  image  display  capabilities,  the  effect  of  different  levels  of 
analyst's  visual  motor  coordination,  and  the  potential  effect  of  optical  illusion. 

Comparison  with  Pilot  Study 

While  the  pilot  study  measured  sixty-three  320  x 240-pixel  images  of  test  objects 
acquired  experimentally,  the  current  study  used  archival  endoscopic  images.  The  current 
study's  goal  was  to  obtain  image  frames  that  represented  real-life  clinical  situations.  Both 
studies  used  images  that  were  obtained  with  a Kay  RLS-9100  Rhino-Laryngeal 
Stroboscope.  However,  while  the  pilot  study  obtained  images  using  both  an  oral  rigid 
endoscope  and  flexible  endoscope,  the  current  study  only  used  images  obtained  by  an 
oral  rigid  endoscope  to  eliminate  distortions  resulting  from  flexible  endoscopes  (such  as 


90 


barrel  distortion).  Also,  the  current  study  discarded  color  information  before  conducting 
measurement,  to  control  for  color  distortion.  Both  studies  did  not  enhance  images 
following  digitization  to  preserve  input  data.  While  standard  methods  of  color 
segmentation  (Schmalz  & Caimi  1996)  were  used  in  the  pilot  study  to  yield  segmentation 
maps,  three  programs  were  used  in  the  current  study  in  an  attempt  to  examine  their 
consistency. 

The  current  study  augmented  existing  reports  in  the  literature  and  supported  the 
findings  of  the  pilot  study  to  show  that  key  problems  with  current  techniques  of 
automated  laryngeal  image  analysis  are,  firstly,  the  glottal  gap  boundary  is  inaccurately 
established  due  to  a lack  of  empirical  or  analytical  modeling  of  the  vocal  fold  cross- 
sectional  profile  and  its  effect  upon  image  contrast.  Secondly,  the  lack  of  a prespecified 
or  standardized  clinical  procedure  for  obtaining  endoscopic  images  is  a major  factor 
producing  inconsistency  of  measurements.  The  third  problem  relating  to  color  constancy 
was  not  investigated  in  the  current  study.  However,  the  pilot  study  did  indicate  that  color 
constancy  should  be  investigated. 

Generalizations  of  Findings 

Several  researchers  raised  the  issue  of  using  digital  image  analysis  for  the 
quantification  of  laryngeal  endoscopic  images  (Gonqalves  & Leonard  1998,  Inagi  et  al. 
1997,  Omori  et  al.  1996,  1998).  This  methodological  study  investigated  the  consistency 
of  two  dependent  variables  using  three  programs,  three  analysts,  and  three  frames  across 
two  conditions.  The  purpose  was  to  examine  inter-program  and  inter-analyst  differences 
and  variability.  Because  all  the  factors  in  this  study  were  fixed,  the  results  obtained  are 
limited  to  the  factors  studied  and  cannot  be  generalized  to  the  population.  This  was 


91 

expected  because  the  purpose  of  this  study  is  to  serve  as  a first  step  towards  developing 
new  software  that  answers  the  methodological  problems  raised  in  the  literature,  the  pilot 
study  and  the  current  study. 

Based  on  the  findings  of  this  study,  it  was  found  that  the  programs  produced 
inconsistent  results  for  NGGA  or  TVFL.  Thus,  inter-program  consistency  was  not 
established.  The  programs  also  had  different  amounts  of  variability  with  Scion  Image 
being  the  least  variable  image  analysis  program.  However,  Scion  had  lower  image 
visibility  than  Adobe  Photoshop  and  Gimp  because  of  its  increased  brightness  levels. 

Analysts  produced  different  results  for  both  NGGA  and  TVFL;  therefore,  inter- 
analyst consistency  was  not  established.  The  analysts  also  had  different  amounts  of 
variability. 

Strengths  and  Limitations 

Strengths 

This  study  was  designed  to  investigate  the  limitations  of  three  programs  for  the 
measurement  of  laryngeal  dimensions.  While  other  studies  were  conducted  towards 
achieving  the  same  goal,  they  used  one  program,  and  did  not  address  the  consistency  of 
the  program  used. 

Although  the  current  study  used  manual  tracing  of  the  glottal  gap  area  and  the 
total  vocal  fold  length,  it  controlled  for  other  environmental  factors  that  were  not 
controlled  for  in  previous  studies.  These  environmental  factors  included  user  to  screen 
distance,  head  movement,  measurement  time,  user  fatigue,  and  screen  resolution.  The 
current  study  also  tried  to  account  for  the  multipixel  border  of  the  vocal  fold  by  enlarging 
the  image  to  400%  of  its  original  size,  thus  providing  the  analysts  with  better  visibility  of 


92 


the  individual  pixels.  Image  enlargement  did  not  affect  their  properties  or  the  resulting 
measurements.  This  study  emphasized  the  factor  of  analyst  experience  as  an  active  factor. 
It  also  used  two  of  the  most  common  dependent  variables  (GGA  and  TVFL)  used  in 
outcome  measurement.  Repeated  trials  were  employed  in  order  to  test  the  consistency  of 
the  improved  method  used  in  this  study  across  three  programs. 

Limitations 

The  sample  of  this  study  is  purposeful,  i.e.  the  sample  was  not  randomly  selected 
because  the  main  interest  was  to  outline  methodological  problems  related  to  digital  image 
analysis  of  the  larynx.  Caution  should  be  applied  to  generalization  to  other  programs 
employed  in  the  quantification  of  laryngeal  dimensions  from  endoscopic  images.  Inter- 
program and  inter-analyst  variability  was  investigated  through  examining  standard 
deviations.  This  was  a function  of  the  factors  being  fixed,  not  random.  To  estimate 
variability , all  factors  should  be  random  and  a larger  number  of  programs,  analysts  and 
frames  should  be  used.  The  findings  are  based  on  a restricted  number  of  programs, 
analysts  and  frames  because  the  current  study  was  methodological.  The  images  used  were 
archival,  permitting  an  open  possibility  that  better  images  could  be  obtained.  Images  used 
in  the  study  contained  a variety  of  image  distortions  that  could  not  be  controlled  for  at 
this  stage.  For  example,  pathological  Frame  3 contained  more  image  distortion  in  its 
periphery  than  the  other  images.  Because  the  issue  of  color  distortion  is  a complex  one, 
color  information  was  discarded  and  grayscale  images  were  used.  This  is  an  advantage  in 
controlling  the  distortion  variables  in  the  current  study.  However,  discarding  color 
information  also  removes  the  hue/saturation  information  (perception  of  change  in  color 
due  to  changes  in  the  wavelength  of  the  perceived  light)  from  the  pixels  and  leaves  just 


93 


the  brightness  values.  The  availability  of  color  information,  if  color  distortion  is 
minimized,  provides  additional  information  to  the  analyst  and/or  the  automated  tracing  of 
laryngeal  structures. 


Suggestions  for  Further  Research 

This  research  contributes  to  clarifying  the  methodological  problems  encountered 
in  digital  image  analysis  of  the  larynx.  Further  research  using  a randomly  selected  sample 
of  digital  image  analysis  programs,  analysts,  and  image  frames  is  essential  to  generalize 
to  the  population,  especially  in  terms  of  variability.  Also,  images  to  be  used  in  future 
research  should  be  acquired  experimentally  with  a prespecified  step-by  step  clinical 
method.  If  Scion  Image  is  to  be  used  as  a platform  for  new  software,  care  should  be  taken 
to  enhance  its  image  display  and  image  manipulation  capabilities.  Color  distortion  should 
also  be  investigated  because  color  information  in  endoscopic  images  of  the  larynx  is 
essential  for  the  assessment  of  normal  as  well  as  pathological  frames.  Laryngeal  frames 
are  predominantly  red  in  color  due  to  tissue  vascularization  and  the  change  in  "redness"  is 
an  important  factor  in  assessing  laryngeal  pathology.  Other  dimensions  of  the  laryngeal 
structures,  such  as  vocal  fold  angle  and  glottal  gap  width,  should  also  be  incorporated  in 
future  laryngeal  digital  image  measurements.  Further  research  should  concentrate  on 
incorporating  the  results  of  this  study  to  the  development  of  a tool  that  can  be  used  to 
define  the  structure  of  the  laryngeal  anatomy  and  laryngeal  pathological  lesions  using  an 
accurate  image  analysis  technique  for  defining  these  formations.  The  new  tool  should 
minimize  the  reliance  on  the  visual  input  of  the  analyst.  The  human  factor  will  always 
contribute  to  the  inaccuracy  of  edge  detection.  Visual  illusion  is  also  a concern.  The  new 
tool  will  eventually  provide  a means  for  empirically  defining  the  effects  of  intervention 


94 


on  laryngeal  pathological  states,  as  it  will  be  able  to  accurately  document  changes  in 
laryngeal  structure  as  a function  of  treatment. 

Thus,  this  study  recommends  investigating  Scion  Image  thoroughly  as  a potential 
platform  for  the  development  of  the  new  image  analysis  tool.  The  design  of  the  new  tool 
should  analyze  the  effects  of  image  noise  as  well  as  computational  and  systematic  error 
on  the  accuracy  of  measured  results.  The  design  should  also  to  include  the  profiling  of 
computation  error  and  formulate  error  functions  that  can  be  encoded  conveniently  in 
computer  software.  In  particular,  it  would  be  useful  to  develop  simple  algorithms  on  both 
Scion  Image  and  Matlab  (The  Math  Works,  Inc.,  MA),  then  to  compare  the  results  from 
the  two  packages.  If  they  produce  similar  results,  then  Matlab  can  be  further  developed. 
Matlab  is  the  software  of  choice  for  algorithm  development  because  it  is  widely  used 
throughout  the  scientific  community  and  is  a standard  in  algorithm  research.  Also, 
developing  algorithms  in  Matlab  circumvents  the  problem  of  recoding  in  C or  C++,  Java, 
etc. 


Summary  and  Conclusion 

The  previously  discussed  studies  augmented  reports  in  the  literature  questioning 
the  ability  of  analysts  to  achieve  consistent  visual  motor  coordination  when  measuring 
laryngeal  dimensions  from  laryngeal  images.  Although  analysts  were  consistent  within 
themselves,  they  achieved  different  results  due  to  the  varied  level  of  experience  and 
possible  effects  of  optical  illusion. 

It  is  clear  that  the  programs  investigated  in  this  study  lack  the  rigorous 
computational  abilities  to  achieve  an  accurate  automated  or  semi-automated 
quantification  of  endoscopic  images.  Among  the  programs  investigated.  Scion  Image 


95 


emerged  as  the  most  relatively  consistent  program.  It  was  also  the  least  practically 
problematic  program  because  it  had  the  simplest  procedures  for  measuring  NGGA  and 
TVFL.  Thus,  Scion  Image  can  be  considered  as  a platform  for  the  development  of  new 
software.  However,  issues  relating  to  image  display  should  be  addressed. 

The  current  study  agrees  with  other  studies  reported  in  the  literature  (Casper  et  al. 
1988)  in  that  there  are  a number  of  problems  with  the  current  techniques  of  digital  image 
analysis  of  the  larynx.  The  current  proposed  methods  for  the  analysis  of  video  images 
(Gonqalves  & Leonard  1998)  may  be  adequate  for  the  quantification  of  x-ray  images  of 
swallowing  and  articulatory  movement  with  some  modification,  such  as  creating  an 
artificial  reference  point  within  the  x-ray  image  (Goulding-Kushner  et  al.  1990). 
However,  it  disagrees  with  the  research  reporting  that  the  current  methodology  is 
adequate  for  the  quantification  of  laryngeal  images  (Gonqalves  & Leonard  1998). 

Because  no  fixed  reference  point  can  be  artificially  created  within  the  laryngeal  structures 
using  the  current  methodology,  the  pitfalls  should  be  accounted  for  through  the 
development  of  a software  package  using  mathematically  rigorous  operations  based  on 
mathematical  modeling  using  Scion  Image,  Matlab,  or  both.  It  was  also  concluded  that 
the  human  factor  is  a major  factor  of  inconsistency  of  produced  results. 


REFRENCES 


Abrams,  R.  (1992).  Coordination  of  eye  and  hand  for  aimed  limb  movement.  In: 
L.  Protean  & D.  Elliott  (Eds).  Vision  and  Motor  Control  (pp.  129-152). 
Amsterdam,  Holland:  Elsevier  Science  Publishers  B.V. 

Adobe  Photoshop  5.5  User  Guide  Supplement  (1999).  CA:  Adobe  Systems  Inc. 

Alberti,  P.W.  (1978).  The  diagnostic  role  of  laryngeal  stroboscopy. 

Clinics  of  North  America,  11,2,  347-354. 

Asari,  K.V.,  Kumar,  S.,  & Radhakirshnan,  D.  (1999).  A new  approach  for 
nonlinear  distortion  correction  in  endoscopic  images  based  on  least 
squares  estimation.  IEEE  Transactions  on  Medical  Imaging,  18,  4,  345- 
354. 

Baden,  W.  (1983).  Simple  graphics  for  printer.  Dr.  Dobb's  Journal,  8,  1 1,  86-90. 

Baer,  T.,  Lofquist,  A.,  & McGarr,  N.  (1983).  Laryngeal  vibrations:  A comparison 
between  high-speed  filming  and  glottographic  techniques.  Journal  of 
Acoustic  Society  of  America,  73,4,  1304-1308. 

Blaugrund,  S.M.  (1991).  Laryngeal  framework  surgery.  In:  C.N.  Ford  & D.M. 

Bless  (Eds).  Phonosurgery:  Assessment  and  surgical  management  of  voice 
disorders.  {y>V-  183-212).  NY:  Raven  Press. 

Bless,  D.M.  (1991).  Assessment  of  laryngeal  function.  In:  C.N.  Ford  & D.M. 

Bless  (Eds).  Phonosurgery:  Assessment  and  surgical  management  of  voice 
disorders,  (pp.  95-122).  NY:  Raven  Press. 


Bless,  D.M.,  Hirano,  M.D.,  & Feder,  R.J.  (1987).  Videostroboscopic  evaluation 
of  the  larynx.  Ear,  Nose  & Throat  Journal,  66, 1,  289. 

Carnahan,  H.  (1992).  Eye,  head,  and  hand  coordination  during  manual  aiming. 

In:  L.  Protean  & D.  Elliott  (Eds).  Vision  and  Motor  Control  (pp.  179-196). 
Amsterdam,  Holland:  Elsevier  Science  Publishers  B.V. 

Casper,  J.,  Brewer,  D.,  & Colton,  R.  (1988).  Pitfalls  and  problems  in  fiberoptic 
videolaryngoscopy,  yowrna/o/Foice,  1,  347-352. 


96 


97 


Castleman,  K.R.  (1979).  Digital  image  processing.  Englewood  Cliffs,  NJ: 
Prentice-Hall. 

Colton,  R.H.,  & Casper,  J.K.  (1996).  Understanding  voice  problems:  A 

physiological  perspective  for  diagnosis  and  treatment.  2"‘*  Ed.  Baltimore, 
MD:  Williams  & Witkins. 

CompServe  Incorporated  (1987).  GIF  ™.  graphic  interchange  format  ™.  a 

standard  defining  a mechanism  for  the  storage  and  transmission  of  raster- 
based  graphics  information.  [On-line].  Available:  http://  www.  Tnt.uni- 
hannover.de/js/soft/imgproc/  fileformats/gif87.txt  Last  updated  June  15, 
1987. 

Digital  Quantitative  Laryngoscopy  (1999).  [On-line].  Available:  http://www.khni 
kum.rwth-aachen.de/  /mib/mbv/projects/laryngoskopie/  QV_uk.htmlJ 
Last  updated  02.07.1999. 


Eckel,  H.E.,  & Sittel,  C.  (1995).  Morphometry  of  the  larynx  in  horizontal 
sections.  American  Journal  of  Otolaryngology,  16,1,  40-8. 

Eckel,  H.E.,  Sittel,  C.,  Walger,  M.,  Sprinzl,  G.,  & Koebke,  J.  (1993). 

Plastination:  a new  approach  to  morphological  research  and  instruction 
with  excised  larynges.  The  Annals  of  Ototlogy,  Rhinology  & Laryngology 
102,  9,  660-5. 


Frattali,  C.M.  (1998).  Outcome  assessment  in  speech-language  pathology.  In:  A.F 
Johnson  & B.H.  Jacobson  (Eds.).  Medical  Speech-Language  Pathology:  A 
Practitoner's  Guide,  (pp.  685-701).  NY:  Thieme. 

Frisby,  J.  ( 1979).  Seeing,  Illusion,  Brain,  and  Mind.  NY:  Oxford 
University  Press. 

Gimp  User  Manual  [Electronic  manual].  (1999).  [ On-line].  Available:  http://ww 
w.gimp.org/the_gimp_about.html 


Gongalves,  M.,  & Leonard,  R.  (1998).  A hardware-software  system  for  analysis 
of  video  images.  Journal  of  Voice,  12,  2,  143-150. 

Goulding-Kushner,  K.J.,  & Contributors  (1990).  Standardization  for  the  reporting 
of  nasopharyngoscopy  and  multiview  videofluoroscopy:  A report  from  an 
international  group.  Cleft  Palate  Journal,  27,  4,  337-348. 

Gray,  H.  (1926).  Anatomy  descriptive  and  applied.  24  ed.  London,  UK:  Longman. 


98 


Hahn,  C.,  & Kitzing,  P.  (1978).  Indirect  endoscopic  photography  of  the 

larynx.  A comparison  between  two  newly  constructed  laryngoscopes.  The 
Journal  of  Audiovisual  Media  in  Medicine,  1,  121-130. 

Hassan,  H.,  Ilgner,  J.,  Palm,  C.,  Lehmann,  T.,  & Spitzer,  K.  & Westhofen,  M. 
(1998).  Objective  Judgment  of  Endoscopic  Laryngeal  Images.  In:  T. 
Lehmann,  C.  Palm,  K.  Spitzer,  and  T.  Tolxdorff  (Eds.),  Advances  in 
Quantitative  Laryngoscopy,  Voice  and  Speech  Research  (pp.  135-142). 
Germany:  Aachen,  RWTH. 


Hirano,  M.,  Kurita,  S.,  & Nakashima,  T.  (1983).  Growth,  development  and  aging 
of  the  vocal  folds.  In:  D.M.  Bless  & J.H.  Abbs  (Eds.),  Vocal  Fold 
Physiology  (pp.  22-43).  San  Diego,  CA:  College  Hill  Press. 

Hiroto,  I.  (1976).  Surgical  voice  improvement  for  unilateral  recurrent  laryngeal 
nerve  paralysis.  Otologia  (Fukuoka  ),  22,  473-474. 

Hollien,  H.,  & Moore,  P.  (1960).  Measurements  of  the  vocal  folds  during 

changes  in  pitch.  Journal  of  Speech  and  Hearing  Research,  3,  157-165. 

Image  Processing  for  the  Medical  Sciences  (1998).  [On-line].  Available: 

http://www.klinikum.rwth-aachen.de/webpages_uk/mib/mbv/bvm98/ 
index_e.html  Last  updated  on  02-12-1998. 

Inagi,  K.,  Khidr,  A. A.,  Ford,  C.N.,  & Bless,  D.  (1997).  Correlation  between  vocal 
functions  and  glottal  measurements  in  patients  with  unilateral  vocal  fold 
paralysis.  Laryngoscope,  107,  782-791. 

Isshiki,  N.,  Morita,  H.,  Okamura,  H.,  & Hiramoto,  M.  (1974).  Thyroplasty  as 
a new  phonosurgical  technique.  Acta-Otolaryngol,  78,  5-6,  451-457. 

Isshiki,  N.,  Okamura,  H.,  & Ishikawa,  T.  (1975).  Thyroplasty  type  I (lateral 

compression)  for  dysphonia  due  to  vocal  cord  paralysis  or  atrophy.  Acta- 
Otolaryngol,  80,  5-6,  465-473. 

Kahane,  J.C.  (1981).  Anatomic  and  physiologic  changes  in  the  aging  peripheral 
speech  mechanism.  In:  D.S.  Beasley  & G.A.  Davis  (Eds).  Aging: 
Communication  processes  and  Disorders,  (pp.  21-45).  NY:  Grune 
& Stratton. 

Kallen,  L.  A.  (1932).  Laryngostroboscopy  in  the  practice  of  otolaryngology. 
Archives  of  Otolaryngology,  16,  791-807. 

Kay  Elemetrics  website.  [On-line].  Available:  http://www.kayelemetrics.com/dvrs 
.htm 


99 


Kent,  R.  (1996).  Hearing  and  believing:  Some  limits  to  the  auditory-perceptual 
assessment  of  speech  and  voice  disorders.  American  Journal  of  Speech- 
Language  Pathology,  5,  3,  7-23. 


Kitzing,  P.  (1985).  Stroboscopy-  a pertinent  laryngological  examination.  Journal 
of  Otolaryngology,  14,  151-157. 

Kreiman,  J.,  & Gerratt,  B.  (1998).  Validity  of  rating  scale  measures  of  voice 

quality.  Journal  of Acoustical  Society  of  America,  104,  3 Pt  1,  1598-608. 


Kreiman,  J.,  Gerratt,  B.,  Precoda,  K.,  & Berke,  G.  (1992).  Individual  differences 
in  voice  quality  perception.  Journal  of  Speech  and  Hearing  Research,  35, 
3,512-520. 

Kreiman,  J.,  Gerratt,  B.,  Precoda,  K.  (1990).  Listener  experience  and  perception 
of  voice  quality.  Journal  of  Speech  and  Hearing  Research,  JJ,  1 , 1 03- 1 1 5. 

Mackenzie,  M.  (1865).  The  use  of  the  laryngoscope  in  diseases  of  the  throat. 
Philadelphia,  PA:  Lindsay  and  Blakiston. 

Montgomery,  W.W.,  & Montgomery,  S.K.  (1997).  Montgomery  thyroplasty 
implant  system.  The  Annals  of  Ototlogy,  Rhinology  & Laryngology- 
Supplement,  170,  1-16. 

Moore,  P.  (1937).  Voice:  A historical  perspective,  a short  history  of  laryngeal 
investigation.  Journal  of  Voice,  5,  3,  266-281. 

Omori,  K.,  Slavit,  D.H.,  Kacker,  A.,  & Blaugrund,  S.M.  (1996).  Quantitative 
criteria  for  predicting  thyroplasty  type  I outcome.  Laryngoscope,  106, 
689-693. 

Omori,  K.,  Slavit,  D.H.,  Kacker,  A.,  & Blaugrund,  S.M.  (1998).  Influence  of  size 
and  etiology  of  glottal  gap  in  glottic  incompetence  dysphonia. 
Laryngoscope,  108,  514-517. 


Padovan,  I.F.,  Christman,  N.T.,  Hamilton,  L.H.,  & Darling,  R.J.  (1973).  Indirect 
microlaryngostroboscopy.  Laryngoscope,  83,  12,  2035-2041. 

Palm  C.,  Pelkmann,  A.,  Lehmann,  T.,  & Spitzer,  K.  (1998).  Distortion  correction 
of  larynoscopic  images.  In:  T.  Lehmann,  C.  Palm,  K.  Spitzer,  & T. 
Tolxdorff  (Eds.),  Advances  in  Quantitative  Laryngoscopy,  Voice  and 
Speech  Research  (pp.  1 17-126).  Germany:  Aachen,  RWTH. 

Price,  R.  (1997).  Image  manipulation.  In  W.  Hendee  & P.  Wells  (Eds.),  The 
perception  of  visual  information  ed.)  (pp.223-249).  NY: 

Springer. 


100 


Proteau,  L.  (1992).  On  the  specifity  of  learning  and  the  role  of  visual  information 
for  movement  control.  In;  L.  Proteau  & D.  Elliott  (Eds).  Vision  and  Motor 
Control  (pp.  67-102).  Amsterdam:  Holland:  Elsevier  Science  Publishers 
B.V. 

Russ,  J.C.  (1999).  The  image  processing  handbook  (3'*^  ed.).  FL:  CRC  Press. 

Saadah,  A.,  Galatsanos,  N.,  Bless,  D.,  & Ramos,  C.  (1997).  Deformation 

analysis  of  the  vocal  folds  from  videostroboscopic  image  sequences  of  the 
larynx.  The  Journal  of  Acoustical  Society  of  America,  103,  6,  3627-3641. 

Sawashima,  M.,  & Hirose,  H.  (1981).  Abduction-adduction  of  the  glottis  in 

speech  and  voice  production.  In  K.N.  Stevens  & M.  Hirano  (Eds.),  Vocal 
Fold  Physiology  (pp.329-346).  Tokyo:  University  of  Tokyo  Press. 

Schmalz,  M.S.,  & Caimi,  F.M.  (1996).  An  introduction  to  analysis  of  errors 

inherent  in  multispectral  imaging  through  the  sea  surface.  1 . Target  and 
media  effects.  Proceedings  SPIE,  2821,  215-226. 

Scion  Corporation  Website.  [On-line].  Available:  http://www.scioncorp.com/fra 
mes/fr_scion_products.htm  MD:  Frederick. 

Scion  Image  Beta  4.2  for  Windows  Manual.  [On-line].  http;//www.scioncorp.com 
/frames/fr_scion_products.htm  MD:  Frederick. 

Shaffer,  J.P.  (1986).  Modified  sequentially  rejective  multiple  test  procedures. 
Journal  of  American  Statistical  Association,  81,  826-831. 

Shaffer,  J.P.  (1991).  On  the  problem  of  interactions  in  the  analysis  of  variance: 
comment.  Journal  of  American  Statistical  Association,  86,  367-369. 

Shavelson,  R.J.  (1981).  Statistical  Reasoning  for  behavioral  Sciences.  Boston, 
MA:  Allyn  and  Bacon. 

Stathopoulos,  E.T.  & Sapienza,  C.M.  (1997).  Developmental  changes  in  laryngeal 
and  respiratory  function  with  variations  in  sound  pressure  level.  Journal 
of  Speech  Language  and  Hearing  Research,  40,  595-614. 

Stemple,  J.C.  (2000).  Clinical  Voice  Pathology:  Theory  and  Management. 
Columbus,  OH;  Merrill  Publication  Co. 


Swett,  H.,  Giger,  M.,  & Doi,  K.  (1997).  Computer  vision  and  decision 

support.  In  W.  Hendee  & P.  Wells  (Eds.),  The  perception  of  visual 
information  (2”‘*  ed.)  (pp.  297-342).  NY:  Springer. 


101 


Third  International  Workshop  (1998).  Advances  in  quantitative  laryngoscopy, 
voice  and  speech  research;  Digital  quantitative  laryngoscopy.  [On-line]. 
Available:  http://vwm.klinikum.rwth-aachen.de/webpages_uk/mib/mpv/ 
lary98  Last  modified:  06.15.1998. 

Timcke,  R.,  von  Leden,  H.,  & Moore,  G.P.  (1958).  The  laryngeal  vibration: 
Measurement  of  the  glottic  wave.  Parti,  the  normal  vibratory  cycle. 
Archives  of  Otolaryngology,  68,  1-9. 

Titze,  I.R.  (1989).  Physiologic  and  acoustic  differences  between  male  and  female 
voices.  Journal  of  the  Acoustical  Society  of  America,  85,  1699-1707. 

Titze,  I.R.  (1993).  Vocal  fold  physiology:  Frontiers  in  basic  Science.  San  Diego. 
CA:  Singular  Publishing  Group. 

Titze,  I.R.  (1994).  Principles  of  Voice  Production.  NJ;  Prentice  Hall. 

Tor  Lillqvist's  homepage.  [On-line].  Available:  http://user.sgic.fi/~tml/index.html 

Turner,  G.,  & Williams,  W.N.  (1991).  Fluoroscopy  and  nasoendoscopy  in 

designing  palatal  lift  prostheses.  Journal  of  Prosthetic  Dentistry,  66,  1,  63- 
70. 

Ulrich,  R.  (1993).  Visual  data  formatting.  In  W.  Hendee  & P.  Wells  (Eds.),  The 
perception  of  visual  information  (2"**  ed.)  (pp.  Ill -222).  NY:  Springer. 

von  Leden,  H.  (1991).  Laryngeal  framework  surgery.  In:  C.N.  Ford  & D.M. 

Bless  (Eds).  Phonosurgery:  Assessment  and  surgical  management  of  voice 
disorders,  (pp.  3-24).  NY:  Raven  Press. 

Woo,  P.  (1996).  Quantification  of  videostrobolaryngoscopic  fmdings- 

Measurements  of  the  normal  glottal  cycle.  Laryngoscope,  106,  1-19. 

Zemlin,  W.R.  (1998).  Speech  and  Hearing  Science  Anatomy  and  Physiology.  (4*'’ 
Ed.).  MA:  Allyn  and  Bacvon. 


BIOGRAPHICAL  SKETCH 


Mr.  Yaser  Natour  was  bom  on  July  26,  1969,  in  Amman,  Jordan.  He  received  a 
Bachelor  of  Arts  degree  in  English  Language  and  Literature  and  Arabic  Language  and 
Literature  (Double  Major)  from  the  University  of  Jordan  in  1991.  He  graduated  from  the 
University  of  Jordan  in  1995  with  a Master  of  Arts  degree  in  Communication  Sciences 
and  Speech  Pathology.  After  graduation,  he  was  employed  as  a special  needs  teacher  in 
Amman  Baccalaureate  School.  He  then  worked  as  a speech-language  pathologist  at  the 
Speech  and  Hearing  Clinic  in  the  University  of  Jordan.  Yaser  was  awarded  a Fulbright 
Scholarship  to  pursue  a Doctor  of  Philosophy  in  communication  sciences  and  disorders. 
In  August  of  1998,  he  enrolled  in  the  doctoral  program  at  the  University  of  Florida.  His 
Ph.D.  in  communication  sciences  and  disorders  was  awarded  in  May  2001. 


102 


I certify  that  I have  read  this  study  and  that  in  my  opinion  it  conforms  to 
acceptable  standards  of  scholarly  presentation  and  is  fully  adequate,  in  scope  and  quality, 
as  a dissertation  for  the  degree  of  Doctor  of  Philosophj 


rtmStne  M Sa^'enya 
Associate  Professor  of  Communication 
Sciences  and  Disorders 


I certify  that  I have  read  this  study  and  that  in  my  opinion  it  conforms  to 
acceptable  standards  of  scholarly  presentation  and  is  fully  adequate,  in  scope  and  quality, 
as  a dissertation  for  the  degree  of  Doctor  of  Philosophy  ^ 


William  S.  Brown,  Jr. 

Professor  of  Communication  Sciences 
and  Disorders 


I certify  that  I have  read  this  study  and  that  in  my  opinion  it  conforms  to 
acceptable  standards  of  scholarly  presentation  and  is  fully  adequate,  in  scope  and  quality, 
as  a dissertation  for  the  degree  of  Doctor  of  Philosophy. 


Howard  B.  Rothman 
Professor  of  Communication  Sciences 
and  Disorders 


I certify  that  I have  read  this  study  and  that  in  my  opinion  it  conforms  to 
acceptable  standards  of  scholarly  presentation  and  is  fully  adequate,  in  scope  and  quality, 
as  a dissertation  for  the  degree  of  Doctor  of  Philosophy. 

Paul  W.  Davenport 
Professor  of  Physiological  Sciences 

I certify  that  I have  read  this  study  and  that  in  my  opinion  it  conforms  to 
acceptable  standards  of  seholarly  presentation  and  is  fully  adequate,  in  scope  and  quality, 
as  a dissertation  for  the  degree  of  Doctor 


Assistant  Scientist,  Computer  and 

Information  Science  and  Engineering 


This  dissertation  was  submitted  to  the  Graduate  Faculty  of  the  Department  of 
Communication  Sciences  and  Disorders  in  the  College  of  Liberal  Arts  and  Sciences  and 
to  the  Graduate  School  and  was  accepted  as  partial  fulfillment  of  the  requirements  for  the 
degree  of  Doctor  of  Philosophy. 

May  2001 

Dean,  Graduate  School 


